We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 572 entries: 1-572 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 6 May 24

[1]  arXiv:2405.01540 [pdf, other]
Title: Universal Imitation Games
Comments: 98 pages. arXiv admin note: substantial text overlap with arXiv:2402.18732
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Alan Turing proposed in 1950 a framework called an imitation game to decide if a machine could think. Using mathematics developed largely after Turing -- category theory -- we analyze a broader class of universal imitation games (UIGs), which includes static, dynamic, and evolutionary games. In static games, the participants are in a steady state. In dynamic UIGs, "learner" participants are trying to imitate "teacher" participants over the long run. In evolutionary UIGs, the participants are competing against each other in an evolutionary game, and participants can go extinct and be replaced by others with higher fitness. We use the framework of category theory -- in particular, two influential results by Yoneda -- to characterize each type of imitation game. Universal properties in categories are defined by initial and final objects. We characterize dynamic UIGs where participants are learning by inductive inference as initial algebras over well-founded sets, and contrast them with participants learning by conductive inference over the final coalgebra of non-well-founded sets. We briefly discuss the extension of our categorical framework for UIGs to imitation games on quantum computers.

[2]  arXiv:2405.01543 [pdf, other]
Title: Transforming Software Development with Generative AI: Empirical Insights on Collaboration and Workflow
Comments: To be published in Generative AI for Effective Software Development - this https URL
Subjects: Software Engineering (cs.SE)

Generative AI (GenAI) has fundamentally changed how knowledge workers, such as software developers, solve tasks and collaborate to build software products. Introducing innovative tools like ChatGPT and Copilot has created new opportunities to assist and augment software developers across various problems. We conducted an empirical study involving interviews with 13 data scientists, managers, developers, designers, and frontend developers to investigate the usage of GenAI. Our study reveals that ChatGPT signifies a paradigm shift in the workflow of software developers. The technology empowers developers by enabling them to work more efficiently, speed up the learning process, and increase motivation by reducing tedious and repetitive tasks. Moreover, our results indicate a change in teamwork collaboration due to software engineers using GenAI for help instead of asking co-workers which impacts the learning loop in agile teams.

[3]  arXiv:2405.01544 [pdf, ps, other]
Title: Transformational Outsourcing in IT Project Management
Comments: 17 pages, 4 Figures
Subjects: Software Engineering (cs.SE)

Transformational outsourcing represents a strategic shift from traditional cost-focused outsourcing to a more profound and collaborative approach. It involves partnering with service providers to accomplish routine tasks and drive substantial organizational change and innovation. The report discusses the significance of pursuing transformational outsourcing for IT companies, highlighting its role in achieving strategic growth, competitive advantage, and cost-efficiency while enabling a focus on core competencies. It explores the pros and cons of IT outsourcing, emphasizing the benefits of cost savings, global talent access, scalability, and challenges related to quality, control, and data security. Additionally, the report identifies some critical reasons why outsourcing efforts may fail in achieving organizational goals, including poor vendor selection, communication issues, unclear objectives, resistance to change, and inadequate risk management. When carefully planned and executed, transformational outsourcing offers IT companies a pathway to enhance efficiency and foster innovation and competitiveness in a rapidly evolving technology landscape.

[4]  arXiv:2405.01545 [pdf, ps, other]
Title: Analysing software failure using runtime verification and LTL
Subjects: Software Engineering (cs.SE)

A self-healing software system is an advanced computer program or system designed to detect, diagnose, and automatically recover from faults or errors without human intervention. These systems are typically employed in mission-critical applications where downtime can have significant financial or operational consequences. Failure detection is one of the important steps in the self-healing system. In this research, a method using runtime verification is proposed to diagnose four types of errors at the component level. The simulation on mRUBIS shows that the suggested method has the necessary efficiency in detecting the occurrence of failures.

[5]  arXiv:2405.01546 [pdf, ps, other]
Title: It Will Never Work in Theory
Comments: 4 pages, 2 tables, to appear in "IEEE Software"
Subjects: Software Engineering (cs.SE)

We have been trying to get software engineering researchers and practitioners to talk to one another for over a decade. This paper describes what we have done, assesses our impact, and recommends an approach that we hope will have greater success.

[6]  arXiv:2405.01549 [pdf, ps, other]
Title: Exploring Conceptual Modeling Metaphysics: Existence Containers, Leibniz's Monads and Avicenna's Essence
Authors: Sabah Al-Fedaghi
Comments: 11 pages, 27 Figures
Subjects: Software Engineering (cs.SE)

Requirement specifications in software engineering involve developing a conceptual model of a target domain. The model is based on ontological exploration of things in reality. Many things in such a process closely tie to problems in metaphysics, the field of inquiry of what reality fundamentally is. According to some researchers, metaphysicians are trying to develop an account of the world that properly conceptualizes the way it is, and software design is similar. Notions such as classes, object orientation, properties, instantiation, algorithms, etc. are metaphysical concepts developed many years ago. Exploring the metaphysics of such notions aims to establish quality assurance though some objective foundation not subject to misapprehensions and conventions. Much metaphysical work might best be understood as a model-building process. Here, a model is viewed as a hypothetical structure that we describe and investigate to understand more complex, real-world systems. The purpose of this paper is to enhance understanding of the metaphysical origins of conceptual modeling as exemplified by a specific proposed high-level model called thinging machines (TMs). The focus is on thimacs (things/machine) as a single category of TM modeling in the context of a two-phase world of staticity and dynamics. The general idea of this reality has been inspired by Deleuze s the virtual and related to the classical notions of Leibniz's monads and Avicenna's essence. The analysis of TMs leads to several interesting results about a thimac s nature at the static and existence levels.

[7]  arXiv:2405.01553 [pdf, ps, other]
Title: Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Recently, Large Langauge Models (LLMs) have gained a lot of attention in the Software Engineering (SE) community. LLMs or their variants pre-trained on code are used for many SE tasks. A main approach for adapting LLMs to the downstream task is to fine-tune the models. However, with having billions-parameters-LLMs, fine-tuning the models is not practical. An alternative approach is using Parameter Efficient Fine Tuning (PEFT), in which the model parameters are frozen and only a few added parameters are trained. Though the LLMs are used for programming languages such as Python and Java widely, their capability for low-resource languages is limited. In this work, we empirically study PEFT methods, LoRA and Compacter, on CodeT5 and CodeLlama. We will assess their performance compared to fully fine-tuned models, whether they can be used for knowledge transfer from natural language models to code (using T5 and Llama models), and their ability to adapt the learned knowledge to an unseen language. For the unseen language, we aim to study R, as it has a wide community. The adaptability with less computational costs makes LLMs accessible in scenarios where heavy computational resources are not available. Moreover, studying R opens new opportunities for using LLMs for other languages. We anticipate our findings to showcase the capabilities of PEFT for code LLMs for R and reveal the improvement areas.

[8]  arXiv:2405.01554 [pdf, other]
Title: Early-stage detection of cognitive impairment by hybrid quantum-classical algorithm using resting-state functional MRI time-series
Comments: 28 pages, 10 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Following the recent development of quantum machine learning techniques, the literature has reported several quantum machine learning algorithms for disease detection. This study explores the application of a hybrid quantum-classical algorithm for classifying region-of-interest time-series data obtained from resting-state functional magnetic resonance imaging in patients with early-stage cognitive impairment based on the importance of cognitive decline for dementia or aging. Classical one-dimensional convolutional layers are used together with quantum convolutional neural networks in our hybrid algorithm. In the classical simulation, the proposed hybrid algorithms showed higher balanced accuracies than classical convolutional neural networks under the similar training conditions. Moreover, a total of nine brain regions (left precentral gyrus, right superior temporal gyrus, left rolandic operculum, right rolandic operculum, left parahippocampus, right hippocampus, left medial frontal gyrus, right cerebellum crus, and cerebellar vermis) among 116 brain regions were found to be relatively effective brain regions for the classification based on the model performances. The associations of the selected nine regions with cognitive decline, as found in previous studies, were additionally validated through seed-based functional connectivity analysis. We confirmed both the improvement of model performance with the quantum convolutional neural network and neuroscientific validities of brain regions from our hybrid quantum-classical model.

[9]  arXiv:2405.01555 [pdf, ps, other]
Title: Digital Twin-Empowered Task Assignment in Aerial MEC Network: A Resource Coalition Cooperation Approach with Generative Model
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

To meet the demands for ubiquitous communication and temporary edge computing in 6G networks, aerial mobile edge computing (MEC) networks have been envisioned as a new paradigm. However, dynamic user requests pose challenges for task assignment strategies. Most of the existing research assumes that the strategy is deployed on ground-based stations or UAVs, which will be ineffective in an environment lacking infrastructure and continuous energy supply. Moreover, the resource mutual exclusion problem of dynamic task assignment has not been effectively solved. Toward this end, we introduce the digital twin (DT) into the aerial MEC network to study the resource coalition cooperation approach with the generative model (GM), which provides a preliminary coalition structure for the coalition game. Specifically, we propose a novel network framework that is composed of an application plane, a physical plane, and a virtual plane. After that, the task assignment problem is simplified to convex optimization programming with linear constraints. And then, we also propose a resource coalition cooperation approach that is based on a transferable utility (TU) coalition game to obtain an approximate optimal solution. Numerical results confirm the effectiveness of our proposed approach in terms of energy consumption and utilization of resources.

[10]  arXiv:2405.01556 [pdf, other]
Title: Semantically Aligned Question and Code Generation for Automated Insight Generation
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions.

[11]  arXiv:2405.01557 [pdf, other]
Title: An Experimental Study on the Rashomon Effect of Balancing Methods in Imbalanced Classification
Comments: 16 pages, 6 figures
Subjects: Machine Learning (cs.LG)

Predictive models may generate biased predictions when classifying imbalanced datasets. This happens when the model favors the majority class, leading to low performance in accurately predicting the minority class. To address this issue, balancing or resampling methods are critical pre-processing steps in the modeling process. However, there have been debates and questioning of the functionality of these methods in recent years. In particular, many candidate models may exhibit very similar predictive performance, which is called the Rashomon effect, in model selection. Selecting one of them without considering predictive multiplicity which is the case of yielding conflicting models' predictions for any sample may lead to a loss of using another model. In this study, in addition to the existing debates, the impact of balancing methods on predictive multiplicity is examined through the Rashomon effect. It is important because the blind model selection is risky from a set of approximately equally accurate models. This may lead to serious problems in model selection, validation, and explanation. To tackle this matter, we conducted real dataset experiments to observe the impact of balancing methods on predictive multiplicity through the Rashomon effect. Our findings showed that balancing methods inflate the predictive multiplicity, and they yield varying results. To monitor the trade-off between performance and predictive multiplicity for conducting the modeling process responsibly, we proposed using the extended performance-gain plot for the Rashomon effect.

[12]  arXiv:2405.01558 [pdf, other]
Title: Configurable Learned Holography
Comments: 14 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optics (physics.optics)

In the pursuit of advancing holographic display technology, we face a unique yet persistent roadblock: the inflexibility of learned holography in adapting to various hardware configurations.
This is due to the variances in the complex optical components and system settings in existing holographic displays.
Although the emerging learned approaches have enabled rapid and high-quality hologram generation, any alteration in display hardware still requires a retraining of the model.
Our work introduces a configurable learned model that interactively computes 3D holograms from RGB-only 2D images for a variety of holographic displays.
The model can be conditioned to predefined hardware parameters of existing holographic displays such as working wavelengths, pixel pitch, propagation distance, and peak brightness without having to retrain.
In addition, our model accommodates various hologram types, including conventional single-color and emerging multi-color holograms that simultaneously use multiple color primaries in holographic displays.
Notably, we enabled our hologram computations to rely on identifying the correlation between depth estimation and 3D hologram synthesis tasks within the learning domain for the first time in the literature.
We employ knowledge distillation via a student-teacher learning strategy to streamline our model for interactive performance.
Achieving up to a 2x speed improvement compared to state-of-the-art models while consistently generating high-quality 3D holograms with different hardware configurations.

[13]  arXiv:2405.01559 [pdf, other]
Title: Untangling Knots: Leveraging LLM for Error Resolution in Computational Notebooks
Comments: accepted at 1st ACM CHI Workshop on Human-Notebook Interactions
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)

Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. There are many tools for bug fixing; however, they are generally targeted at the classical linear code. With the rise of code-fluent Large Language Models, a new stream of smart bug-fixing tools has emerged. However, the applicability of those tools is still problematic for non-linear computational notebooks. In this paper, we propose a potential solution for resolving errors in computational notebooks via an iterative LLM-based agent. We discuss the questions raised by this approach and share a novel dataset of computational notebooks containing bugs to facilitate the research of the proposed approach.

[14]  arXiv:2405.01560 [pdf, ps, other]
Title: Copyright related risks in the creation and use of ML/AI systems
Authors: Daniel M. German
Subjects: Software Engineering (cs.SE); Computers and Society (cs.CY)

This paper summarizes the current copyright related risks that Machine Learning (ML) and Artificial Intelligence (AI) systems (including Large Language Models --LLMs) incur. These risks affect different stakeholders: owners of the copyright of the training data, the users of ML/AI systems, the creators of trained models, and the operators of AI systems. This paper also provides an overview of ongoing legal cases in the United States related to these risks.

[15]  arXiv:2405.01561 [pdf, ps, other]
Title: Rapid Mobile App Development for Generative AI Agents on MIT App Inventor
Journal-ref: Journal of advances in information science and technology 2(3) 1-8, March 2024
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications using the development platform provided by MIT App Inventor. To demonstrate its efficacy, we share the development journey of three distinct mobile applications: SynchroNet for fostering sustainable communities; ProductiviTeams for addressing procrastination; and iHELP for enhancing community safety. All three applications seamlessly integrate a spectrum of generative AI features, leveraging OpenAI APIs. Furthermore, we offer insights gleaned from overcoming challenges in integrating diverse tools and AI functionalities, aiming to inspire young developers to join our efforts in building practical AI agent applications.

[16]  arXiv:2405.01562 [pdf, other]
Title: Discrete Event Simulation: It's Easy with SimPy!
Authors: Dmitry Zinoviev
Comments: 19 pages; 5 figures; first published in PragPub in 2018
Subjects: Mathematical Software (cs.MS); Multiagent Systems (cs.MA)

This paper introduces the practicalities and benefits of using SimPy, a discrete event simulation (DES) module written in Python, for modeling and simulating complex systems. Through a step-by-step exploration of the classical Dining Philosophers Problem, we demonstrate how SimPy enables the efficient construction of discrete event models, emphasizing system states, transitions, and event handling. We extend the scenario to introduce resources, such as chopsticks, to model contention and deadlock conditions, and showcase SimPy's capabilities in managing these scenarios. Furthermore, we explore the integration of SimPy with other Python libraries for statistical analysis, showcasing how simulation results inform system design and optimization. The versatility of SimPy is further highlighted through additional modeling scenarios, including resource constraints and customer service interactions, providing insights into the process of building, debugging, simulating, and optimizing models for a wide range of applications. This paper aims to make DES accessible to practitioners and researchers alike, emphasizing the ease with which complex simulations can be constructed, analyzed, and visualized using SimPy and the broader Python ecosystem.

[17]  arXiv:2405.01563 [pdf, other]
Title: Mitigating LLM Hallucinations via Conformal Abstention
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answer. Building on earlier approaches that use self-consistency as a more reliable measure of model confidence, we propose using the LLM itself to self-evaluate the similarity between each of its sampled responses for a given query. We then further leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate). Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets, while also maintaining a significantly less conservative abstention rate on a dataset with long responses (Temporal Sequences) compared to baselines using log-probability scores to quantify uncertainty, while achieveing comparable performance on a dataset with short answers (TriviaQA). To evaluate the experiments automatically, one needs to determine if two responses are equivalent given a question. Following standard practice, we use a thresholded similarity function to determine if two responses match, but also provide a method for calibrating the threshold based on conformal prediction, with theoretical guarantees on the accuracy of the match prediction, which might be of independent interest.

[18]  arXiv:2405.01564 [pdf, other]
Title: Prioritizing Software Requirements Using Large Language Models
Subjects: Software Engineering (cs.SE)

Large Language Models (LLMs) are revolutionizing Software Engineering (SE) by introducing innovative methods for tasks such as collecting requirements, designing software, generating code, and creating test cases, among others. This article focuses on requirements engineering, typically seen as the initial phase of software development that involves multiple system stakeholders. Despite its key role, the challenge of identifying requirements and satisfying all stakeholders within time and budget constraints remains significant. To address the challenges in requirements engineering, this study introduces a web-based software tool utilizing AI agents and prompt engineering to automate task prioritization and apply diverse prioritization techniques, aimed at enhancing project management within the agile framework. This approach seeks to transform the prioritization of agile requirements, tackling the substantial challenge of meeting stakeholder needs within set time and budget limits. Furthermore, the source code of our developed prototype is available on GitHub, allowing for further experimentation and prioritization of requirements, facilitating research and practical application.

[19]  arXiv:2405.01565 [pdf, other]
Title: The Role of Code Proficiency in the Era of Generative AI
Comments: submitted to Software Engineering 2030
Subjects: Software Engineering (cs.SE)

At the current pace of technological advancements, Generative AI models, including both Large Language Models and Large Multi-modal Models, are becoming integral to the developer workspace. However, challenges emerge due to the 'black box' nature of many of these models, where the processes behind their outputs are not transparent. This position paper advocates for a 'white box' approach to these generative models, emphasizing the necessity of transparency and understanding in AI-generated code to match the proficiency levels of human developers and better enable software maintenance and evolution. We outline a research agenda aimed at investigating the alignment between AI-generated code and developer skills, highlighting the importance of responsibility, security, legal compliance, creativity, and social value in software development. The proposed research questions explore the potential of white-box methodologies to ensure that software remains an inspectable, adaptable, and trustworthy asset in the face of rapid AI integration, setting a course for research that could shape the role of code proficiency into 2030 and beyond.

[20]  arXiv:2405.01566 [pdf, other]
Title: 2HCDL: Holistic Human-Centered Development Lifecycle
Authors: Said Daoudagh (1), Eda Marchetti (1), Oum-El-Kheir Aktouf (2) ((1) CNR-ISTI, Pisa, Italy, (2) Univ. Grenoble Alpes, Grenoble INP, LCIS, Valence, France)
Comments: S. Bernardi, T. Zoppi (Editors), "Fast Abstracts and Student Forum Proceedings - EDCC 2024 - 19th European Dependable Computing Conference, Leuven, Belgium, 8-11 April 2024"
Subjects: Software Engineering (cs.SE)

The recent events affecting global society continuously highlight the need to change the development lifecycle of complex systems by promoting human-centered solutions that increase awareness and ensure critical properties such as security, safety, trust, transparency, and privacy. This fast abstract introduces the Holistic Human-Centered Development Lifecycle (2HCDL) methodology focused on: (i) the enforcement of human values and properties and (ii) the mitigation and prevention of critical issues for more secure, safe, trustworthy, transparent, and private development processes.

[21]  arXiv:2405.01567 [pdf, other]
Title: CodeFort: Robust Training for Code Generation Models
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Code generation models are not robust to small perturbations, which often lead to inconsistent and incorrect generations and significantly degrade the performance of these models. Improving the robustness of code generation models is crucial to better user experience when these models are deployed in real-world applications. However, existing efforts have not addressed this issue for code generation models. To fill this gap, we propose CodeFort, a framework to improve the robustness of code generation models, generalizing a large variety of code perturbations to enrich the training data and enabling various robust training strategies, mixing data augmentation, batch augmentation, adversarial logits pairing, and contrastive learning, all carefully designed to support high-throughput training. Extensive evaluations show that we improve the average robust pass rates of baseline CodeGen models from 14.79 to 21.74. Notably, the improvement in robustness against code-syntax perturbations is evidenced by a significant decrease in pass rate drop from 95.04% to 53.35%

[22]  arXiv:2405.01568 [pdf, ps, other]
Title: Convert any android device into a programmable IoT device with the help of IoT Everywhere Framework
Authors: Vishnu Joshi
Comments: 4 pages, 10 figures
Subjects: Software Engineering (cs.SE)

The world around us is transforming as the field of the Internet of Things is taking over the world faster than we thought. Everyone in the tech industry is building wonderful things with the help of IoT. Smartwatches, smart coffee machines, smart television, smart homes are some of the examples. Building IoT sensor modules with sensors that connect to the internet can be very intimidating for people who have just stepped into the field. Quality components and microcontrollers can be costly too. Components such as proximity sensor, humidity sensor, air pressure sensor, accelerometer, gyroscope, flashlight, microphone, speaker, gsm module, wifi module, Bluetooth modules, and many more. But to program these we need to know java or kotlin and mobile application development. With the use of the IoT Everywhere framework and Origin programming language, one can convert any Android smartphone into an IoT device. This helps students of electrical engineering to grasp the idea of programming since it provides a lot of abstraction through simple function calls it can help to introduce programming to school students, it helps students who are fascinated by IoT and who wants to learn the basic of interfacing components or sensors and helps the student who has no access to an actual personal computer learn to program.

[23]  arXiv:2405.01569 [pdf, other]
Title: A Systematic Literature Review on Reasons and Approaches for Accurate Effort Estimations in Agile
Comments: Journal article
Subjects: Software Engineering (cs.SE)

Background: Accurate effort estimation is crucial for planning in Agile iterative development. Agile estimation generally relies on consensus-based methods like planning poker, which require less time and information than other formal methods (e.g., COSMIC) but are prone to inaccuracies. Understanding the common reasons for inaccurate estimations and how proposed approaches can assist practitioners is essential. However, prior systematic literature reviews (SLR) only focus on the estimation practices (e.g., [26, 127]) and the effort estimation approaches (e.g., [6]). Aim: We aim to identify themes of reasons for inaccurate estimations and classify approaches to improve effort estimation. Method: We conducted an SLR and identified the key themes and a taxonomy. Results: The reasons for inaccurate estimation are related to information quality, team, estimation practice, project management, and business influences. The effort estimation approaches were the most investigated in the literature, while only a few aim to support the effort estimation process. Yet, few automated approaches are at risk of data leakage and indirect validation scenarios. Recommendations: Practitioners should enhance the quality of information for effort estimation, potentially by adopting an automated approach. Future research should aim to improve the information quality, while avoiding data leakage and indirect validation scenarios.

[24]  arXiv:2405.01572 [pdf, other]
Title: A Semi-Formal Verification Methodology for Efficient Configuration Coverage of Highly Configurable Digital Designs
Comments: Published in DVCon U.S. 2021
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)

Nowadays, a majority of System-on-Chips (SoCs) make use of Intellectual Property (IP) in order to shorten development cycles. When such IPs are developed, one of the main focuses lies in the high configurability of the design. This flexibility on the design side introduces the challenge of covering a huge state space of IP configurations on the verification side to ensure the functional correctness under every possible parameter setting. The vast number of possibilities does not allow a brute-force approach, and therefore, only a selected number of settings based on typical and extreme assumptions are usually verified. Especially in automotive applications, which need to follow the ISO 26262 functional safety standard, the requirement of covering all significant variants needs to be fulfilled in any case. State-of-the-Art existing verification techniques such as simulation-based verification and formal verification have challenges such as time-space explosion and state-space explosion respectively and therefore, lack behind in verifying highly configurable digital designs efficiently. This paper is focused on a semi-formal verification methodology for efficient configuration coverage of highly configurable digital designs. The methodology focuses on reduced runtime based on simulative and formal methods that allow high configuration coverage. The paper also presents the results when the developed methodology was applied on a highly configurable microprocessor IP and discusses the gained benefits.

[25]  arXiv:2405.01573 [pdf, other]
Title: Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository
Comments: Preprint
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level in various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Existing research often treats class-level generation as an isolated task, neglecting the intricate dependencies and interactions that characterize real-world software development environments. To address this gap, we introduce RepoClassBench, a benchmark designed to rigorously evaluate LLMs in generating complex, class-level code within real-world repositories. RepoClassBench includes natural language to class generation tasks across Java and Python, from a selection of public repositories. We ensure that each class in our dataset not only has cross-file dependencies within the repository but also includes corresponding test cases to verify its functionality. We find that current models struggle with the realistic challenges posed by our benchmark, primarily due to their limited exposure to relevant repository contexts. To address this shortcoming, we introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context in an agent-based framework. Our experiments demonstrate that RRR significantly outperforms existing baselines on RepoClassBench, showcasing its effectiveness across programming languages and in various settings. Our findings emphasize the need for benchmarks that incorporate repository-level dependencies to more accurately reflect the complexities of software development. Our work illustrates the benefits of leveraging specialized tools to enhance LLMs understanding of repository context. We plan to make our dataset and evaluation harness public.

[26]  arXiv:2405.01574 [pdf, ps, other]
Title: On Using Agent-based Modeling and Simulation for Studying Blockchain Systems
Authors: Önder Gürcan
Comments: 2 pages, "JFMS 2020 -- Les Journees Francophones de la Modelisation et de la Simulation -- Convergences entre la Theorie de la Modelisation et la Simulation et les Systemes Multi-Agents"
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

There is a need for a simulation framework, which is develop as a software using modern engineering approaches (e.g., modularity --i.e., model reuse--, testing, continuous development and continuous integration, automated management of builds, dependencies and documentation) and agile principles, (1) to make rapid prototyping of industrial cases and (2) to carry out their feasibility analysis in a realistic manner (i.e., to test hypothesis by simulating complex experiments involving large numbers of participants of different types acting in one or several blockchain systems).

[27]  arXiv:2405.01575 [pdf, other]
Title: Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024
Comments: Software mention recognition, Named entity recognition, Transformer, Three-stage framework
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our bestperforming system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted F1-score of 67.80%, delivering our team the 3rd rank in Sub-task I for the Software Mention Recognition task.

[28]  arXiv:2405.01576 [pdf, other]
Title: Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We study the tendency of AI systems to deceive by constructing a realistic simulation setting of a company AI assistant. The simulated company employees provide tasks for the assistant to complete, these tasks spanning writing assistance, information retrieval and programming. We then introduce situations where the model might be inclined to behave deceptively, while taking care to not instruct or otherwise pressure the model to do so. Across different scenarios, we find that Claude 3 Opus
1) complies with a task of mass-generating comments to influence public perception of the company, later deceiving humans about it having done so,
2) lies to auditors when asked questions, and
3) strategically pretends to be less capable than it is during capability evaluations.
Our work demonstrates that even models trained to be helpful, harmless and honest sometimes behave deceptively in realistic scenarios, without notable external pressure to do so.

[29]  arXiv:2405.01577 [pdf, other]
Title: HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Hate speech encompasses verbal, written, or behavioral communication that targets derogatory or discriminatory language against individuals or groups based on sensitive characteristics. Automated hate speech detection plays a crucial role in curbing its propagation, especially across social media platforms. Various methods, including recent advancements in deep learning, have been devised to address this challenge. In this study, we introduce HateTinyLLM, a novel framework based on fine-tuned decoder-only tiny large language models (tinyLLMs) for efficient hate speech detection. Our experimental findings demonstrate that the fine-tuned HateTinyLLM outperforms the pretrained mixtral-7b model by a significant margin. We explored various tiny LLMs, including PY007/TinyLlama-1.1B-step-50K-105b, Microsoft/phi-2, and facebook/opt-1.3b, and fine-tuned them using LoRA and adapter methods. Our observations indicate that all LoRA-based fine-tuned models achieved over 80\% accuracy.

[30]  arXiv:2405.01578 [pdf, other]
Title: Empowering IoT Applications with Flexible, Energy-Efficient Remote Management of Low-Power Edge Devices
Comments: 4 pages, Proceedings of the 2023 International Conference on Embedded Wireless Systems and Networks
Subjects: Software Engineering (cs.SE); Networking and Internet Architecture (cs.NI)

In the context of the Internet of Things (IoT), reliable and energy-efficient provision of IoT applications has become critical. Equipping IoT systems with tools that enable a flexible, well-performing, and automated way of monitoring and managing IoT edge devices is an essential prerequisite. In current IoT systems, low-power edge appliances have been utilized in a way that can not be controlled and re-configured in a timely manner. Hence, conducting a trade-off solution between manageability, performance and design requirements are demanded. This paper introduces a novel approach for fine-grained monitoring and managing individual micro-services within low-power edge devices, which improves system reliability and energy efficiency. The proposed method enables operational flexibility for IoT edge devices by leveraging a modularization technique. Following a review of existing solutions for remote-managed IoT services, a detailed description of the suggested approach is presented. Also, to explore the essential design principles that must be considered in this approach, the suggested architecture is elaborated in detail. Finally, the advantages of the proposed solution to deal with disruptions are demonstrated in the proof of concept-based experiments.

[31]  arXiv:2405.01579 [pdf, other]
Title: Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises
Subjects: Software Engineering (cs.SE); Computers and Society (cs.CY); Machine Learning (cs.LG)

In programming education, providing manual feedback is essential but labour-intensive, posing challenges in consistency and timeliness. We introduce ECHO, a machine learning method to automate the reuse of feedback in educational code reviews by analysing patterns in abstract syntax trees. This study investigates two primary questions: whether ECHO can predict feedback annotations to specific lines of student code based on previously added annotations by human reviewers (RQ1), and whether its training and prediction speeds are suitable for using ECHO for real-time feedback during live code reviews by human reviewers (RQ2). Our results, based on annotations from both automated linting tools and human reviewers, show that ECHO can accurately and quickly predict appropriate feedback annotations. Its efficiency in processing and its flexibility in adapting to feedback patterns can significantly reduce the time and effort required for manual feedback provisioning in educational settings.

[32]  arXiv:2405.01580 [pdf, other]
Title: On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Authors: Atharva Naik
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

The task of code generation from natural language (NL2Code) has become extremely popular, especially with the advent of Large Language Models (LLMs). However, efforts to quantify and track this progress have suffered due to a lack of reliable metrics for functional correctness. While popular benchmarks like HumanEval have test cases to enable reliable evaluation of correctness, it is time-consuming and requires human effort to collect test cases. As an alternative several reference-based evaluation metrics have been proposed, with embedding-based metrics like CodeBERTScore being touted as having a high correlation with human preferences and functional correctness. In our work, we analyze the ability of embedding-based metrics like CodeBERTScore to measure functional correctness and other helpful constructs like editing effort by analyzing outputs of ten models over two popular code generation benchmarks. Our results show that while they have a weak correlation with functional correctness (0.16), they are strongly correlated (0.72) with editing effort.

[33]  arXiv:2405.01581 [pdf, other]
Title: The Mercurial Top-Level Ontology of Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In our work, we systematize and analyze implicit ontological commitments in the responses generated by large language models (LLMs), focusing on ChatGPT 3.5 as a case study. We investigate how LLMs, despite having no explicit ontology, exhibit implicit ontological categorizations that are reflected in the texts they generate. The paper proposes an approach to understanding the ontological commitments of LLMs by defining ontology as a theory that provides a systematic account of the ontological commitments of some text. We investigate the ontological assumptions of ChatGPT and present a systematized account, i.e., GPT's top-level ontology. This includes a taxonomy, which is available as an OWL file, as well as a discussion about ontological assumptions (e.g., about its mereology or presentism). We show that in some aspects GPT's top-level ontology is quite similar to existing top-level ontologies. However, there are significant challenges arising from the flexible nature of LLM-generated texts, including ontological overload, ambiguity, and inconsistency.

[34]  arXiv:2405.01582 [pdf, other]
Title: Text Quality-Based Pruning for Efficient Training of Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In recent times training Language Models (LMs) have relied on computationally heavy training over massive datasets which makes this training process extremely laborious. In this paper we propose a novel method for numerically evaluating text quality in large unlabelled NLP datasets in a model agnostic manner to assign the text instances a "quality score".
By proposing the text quality metric, the paper establishes a framework to identify and eliminate low-quality text instances, leading to improved training efficiency for LM models. Experimental results over multiple models and datasets demonstrate the efficacy of this approach, showcasing substantial gains in training effectiveness and highlighting the potential for resource-efficient LM training.
For example, we observe an absolute accuracy improvement of 0.9% averaged over 14 downstream evaluation tasks for multiple LM models while using 40% lesser data and training 42% faster when training on the OpenWebText dataset and 0.8% average absolute accuracy improvement while using 20% lesser data and training 21% faster on the Wikipedia dataset.

[35]  arXiv:2405.01583 [pdf, other]
Title: MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning
Authors: Nadia Saeed
Comments: 7 pages, 3 figures, Clinical NLP 2024 workshop proceedings in Shared Task
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The MEDIQA-M3G 2024 challenge necessitates novel solutions for Multilingual & Multimodal Medical Answer Generation in dermatology (wai Yim et al., 2024a). This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA). Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual (English, Chinese, Spanish) learning of informative skin condition representations. Using pre-trained QA models, we further bridge the gap between visual and textual information through multimodal fusion. This approach tackles complex, open-ended questions even without predefined answer choices. We empower the generation of comprehensive answers by feeding the ViT-CLIP model with multiple responses alongside images. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.

[36]  arXiv:2405.01584 [pdf, other]
Title: Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression
Comments: 12 pages, TKDE format
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Signal Processing (eess.SP)

We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, optimizing dictionary atoms to enhance discriminative power based on mutual information and class distribution. This process generates discriminative numerical representations, facilitating the training of simple classifiers such as SVMs and neural networks. We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance. Tested on six benchmark text datasets, our algorithm competes closely with top models, especially in limited-vocabulary contexts, using significantly fewer parameters. \review{Our algorithm closely matches top-performing models, deviating by only ~2\% on limited-vocabulary datasets, using just 10\% of their parameters. However, it falls short on diverse-vocabulary datasets, likely due to the LZW algorithm's constraints with low-repetition data. This contrast highlights its efficiency and limitations across different dataset types.

[37]  arXiv:2405.01585 [pdf, other]
Title: Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications
Comments: 11 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

In recent times Large Language Models have exhibited tremendous capabilities, especially in the areas of mathematics, code generation and general-purpose reasoning. However for specialized domains especially in applications that require parsing and analyzing large chunks of numeric or tabular data even state-of-the-art (SOTA) models struggle. In this paper, we introduce a new approach to solving domain-specific tabular data analysis tasks by presenting a unique RAG workflow that mitigates the scalability issues of existing tabular LLM solutions. Specifically, we present Tabular Embedding Model (TEM), a novel approach to fine-tune embedding models for tabular Retrieval-Augmentation Generation (RAG) applications. Embedding models form a crucial component in the RAG workflow and even current SOTA embedding models struggle as they are predominantly trained on textual datasets and thus underperform in scenarios involving complex tabular data. The evaluation results showcase that our approach not only outperforms current SOTA embedding models in this domain but also does so with a notably smaller and more efficient model structure.

[38]  arXiv:2405.01586 [pdf, other]
Title: Transfer Learning and Transformer Architecture for Financial Sentiment Analysis
Comments: 12 pages, 9 figures
Journal-ref: Proceedings of International Conference on Computational Intelligence, Data Science and Cloud Computing: IEM-ICDC 2021,pages 17--27
Subjects: Computation and Language (cs.CL)

Financial sentiment analysis allows financial institutions like Banks and Insurance Companies to better manage the credit scoring of their customers in a better way. Financial domain uses specialized mechanisms which makes sentiment analysis difficult. In this paper, we propose a pre-trained language model which can help to solve this problem with fewer labelled data. We extend on the principles of Transfer learning and Transformation architecture principles and also take into consideration recent outbreak of pandemics like COVID. We apply the sentiment analysis to two different sets of data. We also take smaller training set and fine tune the same as part of the model.

[39]  arXiv:2405.01587 [pdf, ps, other]
Title: Improve Academic Query Resolution through BERT-based Question Extraction from Images
Journal-ref: 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI) volume 2 (2024) 1-4
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Providing fast and accurate resolution to the student's query is an essential solution provided by Edtech organizations. This is generally provided with a chat-bot like interface to enable students to ask their doubts easily. One preferred format for student queries is images, as it allows students to capture and post questions without typing complex equations and information. However, this format also presents difficulties, as images may contain multiple questions or textual noise that lowers the accuracy of existing single-query answering solutions. In this paper, we propose a method for extracting questions from text or images using a BERT-based deep learning model and compare it to the other rule-based and layout-based methods. Our method aims to improve the accuracy and efficiency of student query resolution in Edtech organizations.

[40]  arXiv:2405.01588 [pdf, other]
Title: Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL
Comments: DPFM Workshop, ICLR 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identify a data bias in these unanswerable questions; they can often be discerned simply by filtering with specific N-gram patterns. Such biases jeopardize the authenticity and reliability of QA system evaluations. To tackle this problem, we propose a simple debiasing method of adjusting the split between the validation and test sets to neutralize the undue influence of N-gram filtering. By experimenting on the MIMIC-III dataset, we demonstrate both the existing data bias in EHRSQL and the effectiveness of our data split strategy in mitigating this bias.

[41]  arXiv:2405.01589 [pdf, ps, other]
Title: GPT-4 passes most of the 297 written Polish Board Certification Examinations
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Introduction: Recently, the effectiveness of Large Language Models (LLMs) has increased rapidly, allowing them to be used in a great number of applications. However, the risks posed by the generation of false information through LLMs significantly limit their applications in sensitive areas such as healthcare, highlighting the necessity for rigorous validations to determine their utility and reliability. To date, no study has extensively compared the performance of LLMs on Polish medical examinations across a broad spectrum of specialties on a very large dataset. Objectives: This study evaluated the performance of three Generative Pretrained Transformer (GPT) models on the Polish Board Certification Exam (Pa\'nstwowy Egzamin Specjalizacyjny, PES) dataset, which consists of 297 tests. Methods: We developed a software program to download and process PES exams and tested the performance of GPT models using OpenAI Application Programming Interface. Results: Our findings reveal that GPT-3.5 did not pass any of the analyzed exams. In contrast, the GPT-4 models demonstrated the capability to pass the majority of the exams evaluated, with the most recent model, gpt-4-0125, successfully passing 222 (75%) of them. The performance of the GPT models varied significantly, displaying excellence in exams related to certain specialties while completely failing others. Conclusions: The significant progress and impressive performance of LLM models hold great promise for the increased application of AI in the field of medicine in Poland. For instance, this advancement could lead to the development of AI-based medical assistants for healthcare professionals, enhancing the efficiency and accuracy of medical services.

[42]  arXiv:2405.01590 [pdf, other]
Title: 101 Billion Arabic Words Dataset
Subjects: Computation and Language (cs.CL)

In recent years, Large Language Models have revolutionized the field of natural language processing, showcasing an impressive rise predominantly in English-centric domains. These advancements have set a global benchmark, inspiring significant efforts toward developing Arabic LLMs capable of understanding and generating the Arabic language with remarkable accuracy. Despite these advancements, a critical challenge persists: the potential bias in Arabic LLMs, primarily attributed to their reliance on datasets comprising English data that has been translated into Arabic. This reliance not only compromises the authenticity of the generated content but also reflects a broader issue -the scarcity of original quality Arabic linguistic data. This study aims to address the data scarcity in the Arab world and to encourage the development of Arabic Language Models that are true to both the linguistic and nuances of the region. We undertook a large-scale data mining project, extracting a substantial volume of text from the Common Crawl WET files, specifically targeting Arabic content. The extracted data underwent a rigorous cleaning and deduplication process, using innovative techniques to ensure the integrity and uniqueness of the dataset. The result is the 101 Billion Arabic Words Dataset, the largest Arabic dataset available to date, which can significantly contribute to the development of authentic Arabic LLMs. This study not only highlights the potential for creating linguistically and culturally accurate Arabic LLMs but also sets a precedent for future research in enhancing the authenticity of Arabic language models.

[43]  arXiv:2405.01591 [pdf, other]
Title: Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model
Comments: Under review
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises questions about the feasibility of these models when encountering with the inevitable variations and errors inherent in real-world medical data. In this paper, we introduce MID-M, a novel framework that leverages the in-context learning capabilities of a general-domain Large Language Model (LLM) to process multimodal data via image descriptions. MID-M achieves a comparable or superior performance to task-specific fine-tuned LMMs and other general-domain ones, without the extensive domain-specific training or pre-training on multimodal data, with significantly fewer parameters. This highlights the potential of leveraging general-domain LLMs for domain-specific tasks and offers a sustainable and cost-effective alternative to traditional LMM developments. Moreover, the robustness of MID-M against data quality issues demonstrates its practical utility in real-world medical domain applications.

[44]  arXiv:2405.01592 [pdf, ps, other]
Title: Text and Audio Simplification: Human vs. ChatGPT
Comments: AMIA Summit, Boston, 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, an evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora. We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated these texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention.

[45]  arXiv:2405.01593 [pdf, other]
Title: Large Language Model Agent for Fake News Detection
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

In the current digital era, the rapid spread of misinformation on online platforms presents significant challenges to societal well-being, public trust, and democratic processes, influencing critical decision making and public opinion. To address these challenges, there is a growing need for automated fake news detection mechanisms. Pre-trained large language models (LLMs) have demonstrated exceptional capabilities across various natural language processing (NLP) tasks, prompting exploration into their potential for verifying news claims. Instead of employing LLMs in a non-agentic way, where LLMs generate responses based on direct prompts in a single shot, our work introduces FactAgent, an agentic approach of utilizing LLMs for fake news detection. FactAgent enables LLMs to emulate human expert behavior in verifying news claims without any model training, following a structured workflow. This workflow breaks down the complex task of news veracity checking into multiple sub-steps, where LLMs complete simple tasks using their internal knowledge or external tools. At the final step of the workflow, LLMs integrate all findings throughout the workflow to determine the news claim's veracity. Compared to manual human verification, FactAgent offers enhanced efficiency. Experimental studies demonstrate the effectiveness of FactAgent in verifying claims without the need for any training process. Moreover, FactAgent provides transparent explanations at each step of the workflow and during final decision-making, offering insights into the reasoning process of fake news detection for end users. FactAgent is highly adaptable, allowing for straightforward updates to its tools that LLMs can leverage within the workflow, as well as updates to the workflow itself using domain knowledge. This adaptability enables FactAgent's application to news verification across various domains.

[46]  arXiv:2405.01597 [pdf, other]
Title: Improving Disease Detection from Social Media Text via Self-Augmentation and Contrastive Learning
Subjects: Computation and Language (cs.CL)

Detecting diseases from social media has diverse applications, such as public health monitoring and disease spread detection. While language models (LMs) have shown promising performance in this domain, there remains ongoing research aimed at refining their discriminating representations. In this paper, we propose a novel method that integrates Contrastive Learning (CL) with language modeling to address this challenge. Our approach introduces a self-augmentation method, wherein hidden representations of the model are augmented with their own representations. This method comprises two branches: the first branch, a traditional LM, learns features specific to the given data, while the second branch incorporates augmented representations from the first branch to encourage generalization. CL further refines these representations by pulling pairs of original and augmented versions closer while pushing other samples away. We evaluate our method on three NLP datasets encompassing binary, multi-label, and multi-class classification tasks involving social media posts related to various diseases. Our approach demonstrates notable improvements over traditional fine-tuning methods, achieving up to a 2.48% increase in F1-score compared to baseline approaches and a 2.1% enhancement over state-of-the-art methods.

[47]  arXiv:2405.01599 [pdf, ps, other]
Title: Xabclib:A Fully Auto-tuned Sparse Iterative Solver
Comments: This article was submitted to SC11, and also was published as a preprint for Research Gate in April 2011. Please refer to: this https URL
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions. By using OpenATLib, we develop a fully auto-tuned sparse iterative solver named Xabclib. Xabclib has several novel run-time AT functions. First, the following new implementations of sparse matrix-vector multiplication (SpMV) for thread processing are implemented:(1) non-zero elements; (2) omission of zero-elements computation for vector reduction; (3) branchless segmented scan (BSS). According to the performance evaluation and the comparison with conventional implementations, the following results are obtained: (1) 14x speedup for non-zero elements and zero-elements computation omission for symmetric SpMV; (2) 4.62x speedup by using BSS. We also develop a "numerical computation policy" that can optimize memory space and computational accuracy. Using the policy, we obtain the following: (1) an averaged 1/45 memory space reduction; (2) avoidance of the "fault convergence" situation, which is a problem of conventional solvers.

[48]  arXiv:2405.01601 [pdf, other]
Title: Efficient Sample-Specific Encoder Perturbations
Comments: To appear in NAACL 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Encoder-decoder foundation models have displayed state-of-the-art performance on a range of autoregressive sequence tasks. This paper proposes a simple and lightweight modification to such systems to control the behaviour according to a specific attribute of interest. This paper proposes a novel inference-efficient approach to modifying the behaviour of an encoder-decoder system according to a specific attribute of interest. Specifically, we show that a small proxy network can be used to find a sample-by-sample perturbation of the encoder output of a frozen foundation model to trigger the decoder to generate improved decodings. This work explores a specific realization of this framework focused on improving the COMET performance of Flan-T5 on Machine Translation and the WER of Whisper foundation models on Speech Recognition. Results display consistent improvements in performance evaluated through COMET and WER respectively. Furthermore, experiments also show that the proxies are robust to the exact nature of the data used to train them and can extend to other domains.

[49]  arXiv:2405.01603 [pdf, other]
Title: KITE: A Kernel-based Improved Transferability Estimation Method
Authors: Yunhui Guo
Comments: 14 pages
Subjects: Machine Learning (cs.LG)

Transferability estimation has emerged as an important problem in transfer learning. A transferability estimation method takes as inputs a set of pre-trained models and decides which pre-trained model can deliver the best transfer learning performance. Existing methods tackle this problem by analyzing the output of the pre-trained model or by comparing the pre-trained model with a probe model trained on the target dataset. However, neither is sufficient to provide reliable and efficient transferability estimations. In this paper, we present a novel perspective and introduce Kite, as a Kernel-based Improved Transferability Estimation method. Kite is based on the key observations that the separability of the pre-trained features and the similarity of the pre-trained features to random features are two important factors for estimating transferability. Inspired by kernel methods, Kite adopts centered kernel alignment as an effective way to assess feature separability and feature similarity. Kite is easy to interpret, fast to compute, and robust to the target dataset size. We evaluate the performance of Kite on a recently introduced large-scale model selection benchmark. The benchmark contains 8 source dataset, 6 target datasets and 4 architectures with a total of 32 pre-trained models. Extensive results show that Kite outperforms existing methods by a large margin for transferability estimation.

[50]  arXiv:2405.01607 [pdf, other]
Title: Wildfire Risk Prediction: A Review
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Wildfires have significant impacts on global vegetation, wildlife, and humans. They destroy plant communities and wildlife habitats and contribute to increased emissions of carbon dioxide, nitrogen oxides, methane, and other pollutants. The prediction of wildfires relies on various independent variables combined with regression or machine learning methods. In this technical review, we describe the options for independent variables, data processing techniques, models, independent variables collinearity and importance estimation methods, and model performance evaluation metrics. First, we divide the independent variables into 4 aspects, including climate and meteorology conditions, socio-economical factors, terrain and hydrological features, and wildfire historical records. Second, preprocessing methods are described for different magnitudes, different spatial-temporal resolutions, and different formats of data. Third, the collinearity and importance evaluation methods of independent variables are also considered. Fourth, we discuss the application of statistical models, traditional machine learning models, and deep learning models in wildfire risk prediction. In this subsection, compared with other reviews, this manuscript particularly discusses the evaluation metrics and recent advancements in deep learning methods. Lastly, addressing the limitations of current research, this paper emphasizes the need for more effective deep learning time series forecasting algorithms, the utilization of three-dimensional data including ground and trunk fuel, extraction of more accurate historical fire point data, and improved model evaluation metrics.

[51]  arXiv:2405.01608 [pdf, ps, other]
Title: A Comprehensive Study on Automated Testing with the Software Lifecycle
Comments: 9
Subjects: Software Engineering (cs.SE)

The software development lifecycle depends heavily on the testing process, which is an essential part of finding issues and reviewing the quality of software. Software testing can be done in two ways: manually and automatically. With an emphasis on its primary function within the software lifecycle, the relevance of testing in general, and the advantages that come with it, this article aims to give a thorough review of automated testing. Finding time- and cost-effective methods for software testing. The research examines how automated testing makes it easier to evaluate software quality, how it saves time as compared to manual testing, and how it differs from each of them in terms of benefits and drawbacks. The process of testing software applications is simplified, customized to certain testing situations, and can be successfully carried out by using automated testing tools.

[52]  arXiv:2405.01609 [pdf, ps, other]
Title: Q-learning-based Opportunistic Communication for Real-time Mobile Air Quality Monitoring Systems
Comments: 2021 IEEE International Conference on Performance, Computing and Communications (IPCCC). arXiv admin note: substantial text overlap with arXiv:2405.01057
Subjects: Networking and Internet Architecture (cs.NI)

We focus on real-time air quality monitoring systems that rely on devices installed on automobiles in this research. We investigate an opportunistic communication model in which devices can send the measured data directly to the air quality server through a 4G communication channel or via Wi-Fi to adjacent devices or the so-called Road Side Units deployed along the road. We aim to reduce 4G costs while assuring data latency, where the data latency is defined as the amount of time it takes for data to reach the server. We propose an offloading scheme that leverages Q-learning to accomplish the purpose. The experiment results show that our offloading method significantly cuts down around 40-50% of the 4G communication cost while keeping the latency of 99.5% packets smaller than the required threshold.

[53]  arXiv:2405.01610 [pdf, other]
Title: Automating the Analysis of Public Saliency and Attitudes towards Biodiversity from Digital Media
Comments: v0.1, 21 pages with 10 figures
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Measuring public attitudes toward wildlife provides crucial insights into our relationship with nature and helps monitor progress toward Global Biodiversity Framework targets. Yet, conducting such assessments at a global scale is challenging. Manually curating search terms for querying news and social media is tedious, costly, and can lead to biased results. Raw news and social media data returned from queries are often cluttered with irrelevant content and syndicated articles. We aim to overcome these challenges by leveraging modern Natural Language Processing (NLP) tools. We introduce a folk taxonomy approach for improved search term generation and employ cosine similarity on Term Frequency-Inverse Document Frequency vectors to filter syndicated articles. We also introduce an extensible relevance filtering pipeline which uses unsupervised learning to reveal common topics, followed by an open-source zero-shot Large Language Model (LLM) to assign topics to news article titles, which are then used to assign relevance. Finally, we conduct sentiment, topic, and volume analyses on resulting data. We illustrate our methodology with a case study of news and X (formerly Twitter) data before and during the COVID-19 pandemic for various mammal taxa, including bats, pangolins, elephants, and gorillas. During the data collection period, up to 62% of articles including keywords pertaining to bats were deemed irrelevant to biodiversity, underscoring the importance of relevance filtering. At the pandemic's onset, we observed increased volume and a significant sentiment shift toward horseshoe bats, which were implicated in the pandemic, but not for other focal taxa. The proposed methods open the door to conservation practitioners applying modern and emerging NLP tools, including LLMs "out of the box," to analyze public perceptions of biodiversity during current events or campaigns.

[54]  arXiv:2405.01611 [pdf, other]
Title: Unifying and extending Precision Recall metrics for assessing generative models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)

With the recent success of generative models in image and text, the evaluation of generative models has gained a lot of attention. Whereas most generative models are compared in terms of scalar values such as Frechet Inception Distance (FID) or Inception Score (IS), in the last years (Sajjadi et al., 2018) proposed a definition of precision-recall curve to characterize the closeness of two distributions. Since then, various approaches to precision and recall have seen the light (Kynkaanniemi et al., 2019; Naeem et al., 2020; Park & Kim, 2023). They center their attention on the extreme values of precision and recall, but apart from this fact, their ties are elusive. In this paper, we unify most of these approaches under the same umbrella, relying on the work of (Simon et al., 2019). Doing so, we were able not only to recover entire curves, but also to expose the sources of the accounted pitfalls of the concerned metrics. We also provide consistency results that go well beyond the ones presented in the corresponding literature. Last, we study the different behaviors of the curves obtained experimentally.

[55]  arXiv:2405.01612 [pdf, ps, other]
Title: Effective Delegation and Leadership in Software Management
Comments: 9 pages
Subjects: Software Engineering (cs.SE)

Delegation and leadership are critical components of software management, as they play a crucial role in determining the success of the software development process. This study examined the relationship between delegation and leadership in software management and the impact of these factors on project outcomes. Results showed that effective delegation and transformational leadership styles can improve workflow, enhance team motivation and productivity, and ultimately lead to successful software development projects. The findings of this study have important implications for software management practices, as they suggest that organizations and software managers should prioritize the development of effective delegation and leadership practices to ensure the success of their software development initiatives. Further research is needed to explore the complex interplay between delegation and leadership in software management and to identify best practices for improving these processes.

[56]  arXiv:2405.01614 [pdf, other]
Title: A probabilistic estimation of remaining useful life from censored time-to-event data
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Predicting the remaining useful life (RUL) of ball bearings plays an important role in predictive maintenance. A common definition of the RUL is the time until a bearing is no longer functional, which we denote as an event, and many data-driven methods have been proposed to predict the RUL. However, few studies have addressed the problem of censored data, where this event of interest is not observed, and simply ignoring these observations can lead to an overestimation of the failure risk. In this paper, we propose a probabilistic estimation of RUL using survival analysis that supports censored data. First, we analyze sensor readings from ball bearings in the frequency domain and annotate when a bearing starts to deteriorate by calculating the Kullback-Leibler (KL) divergence between the probability density function (PDF) of the current process and a reference PDF. Second, we train several survival models on the annotated bearing dataset, capable of predicting the RUL over a finite time horizon using the survival function. This function is guaranteed to be strictly monotonically decreasing and is an intuitive estimation of the remaining lifetime. We demonstrate our approach in the XJTU-SY dataset using cross-validation and find that Random Survival Forests consistently outperforms both non-neural networks and neural networks in terms of the mean absolute error (MAE). Our work encourages the inclusion of censored data in predictive maintenance models and highlights the unique advantages that survival analysis offers when it comes to probabilistic RUL estimation and early fault detection.

[57]  arXiv:2405.01615 [pdf, other]
Title: Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning
Comments: 16 pages, including proofs in the appendix
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Evolution Strategies (ES) have emerged as a competitive alternative for model-free reinforcement learning, showcasing exemplary performance in tasks like Mujoco and Atari. Notably, they shine in scenarios with imperfect reward functions, making them invaluable for real-world applications where dense reward signals may be elusive. Yet, an inherent assumption in ES, that all input features are task-relevant, poses challenges, especially when confronted with irrelevant features common in real-world problems. This work scrutinizes this limitation, particularly focusing on the Natural Evolution Strategies (NES) variant. We propose NESHT, a novel approach that integrates Hard-Thresholding (HT) with NES to champion sparsity, ensuring only pertinent features are employed. Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks.

[58]  arXiv:2405.01617 [pdf, other]
Title: An Explainable and Conformal AI Model to Detect Temporomandibular Joint Involvement in Children Suffering from Juvenile Idiopathic Arthritis
Comments: Accepted at EMBC 2024
Subjects: Machine Learning (cs.LG)

Juvenile idiopathic arthritis (JIA) is the most common rheumatic disease during childhood and adolescence. The temporomandibular joints (TMJ) are among the most frequently affected joints in patients with JIA, and mandibular growth is especially vulnerable to arthritic changes of the TMJ in children. A clinical examination is the most cost-effective method to diagnose TMJ involvement, but clinicians find it difficult to interpret and inaccurate when used only on clinical examinations. This study implemented an explainable artificial intelligence (AI) model that can help clinicians assess TMJ involvement. The classification model was trained using Random Forest on 6154 clinical examinations of 1035 pediatric patients (67% female, 33% male) and evaluated on its ability to correctly classify TMJ involvement or not on a separate test set. Most notably, the results show that the model can classify patients within two years of their first examination as having TMJ involvement with a precision of 0.86 and a sensitivity of 0.7. The results show promise for an AI model in the assessment of TMJ involvement in children and as a decision support tool.

[59]  arXiv:2405.01618 [pdf, ps, other]
Title: Matter: IoT Interoperability for Smart Homes
Comments: 7 pages
Subjects: Networking and Internet Architecture (cs.NI)

The smart home is a major Internet of Things (IoT) application domain with tremendous market expectations. However, communication solutions for smart home devices have exhibited a lack of interoperability, especially, but not only, at the highest layers of the protocol stack. This issue challenges the success of the smart home concept. In order to overcome this problem, crucial industry organizations, including Google, Apple, Amazon and the Connectivity Standards Alliance (formerly, the ZigBee Alliance) have collaborated to produce Matter, a connectivity solution intended to become a universal standard for the smart home. This paper overviews, evaluates and discusses Matter, focusing on its design, features, performance, and potential future directions.

[60]  arXiv:2405.01619 [pdf, other]
Title: An Efficient Finite Element Solver for a Nonuniform size-modified Poisson-Nernst-Planck Ion Channel Model
Authors: Dexuan Xie
Comments: 24 pages, 5 figures, 3 tables
Subjects: Numerical Analysis (math.NA); Biological Physics (physics.bio-ph)

This paper presents an efficient finite element iterative method for solving a nonuniform size-modified Poisson-Nernst-Planck ion channel (SMPNPIC) model, along with a SMPNPIC program package that works for an ion channel protein with a three-dimensional crystallographic structure and an ionic solvent with multiple ionic species. In particular, the SMPNPIC model is constructed and then reformulated by novel mathematical techniques so that each iteration of the method only involves linear boundary value problems and nonlinear algebraic systems, circumventing the numerical difficulties caused by the strong nonlinearities, strong asymmetries, and strong differential equation coupling of the SMPNPIC model. To further improve the method's efficiency, an efficient modified Newton iterative method is adapted to the numerical solution of each related nonlinear algebraic system. Numerical results for a voltage-dependent anion channel (VDAC) and a mixture solution of four ionic species demonstrate the method's convergence, the package's high performance, and the importance of considering nonuniform ion size effects. They also partially validate the SMPNPIC model by the anion selectivity property of VDAC.

[61]  arXiv:2405.01636 [pdf, other]
Title: Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey
Comments: 35 pages, 9 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Artificial Intelligence (XAI) has found numerous applications in computer vision. While image classification-based explainability techniques have garnered significant attention, their counterparts in semantic segmentation have been relatively neglected. Given the prevalent use of image segmentation, ranging from medical to industrial deployments, these techniques warrant a systematic look. In this paper, we present the first comprehensive survey on XAI in semantic image segmentation. This work focuses on techniques that were either specifically introduced for dense prediction tasks or were extended for them by modifying existing methods in classification. We analyze and categorize the literature based on application categories and domains, as well as the evaluation metrics and datasets used. We also propose a taxonomy for interpretable semantic segmentation, and discuss potential challenges and future research directions.

[62]  arXiv:2405.01646 [pdf, other]
Title: Explaining models relating objects and privacy
Comments: 7 pages, 3 figures, 1 table, supplementary material included as Appendix. Paper accepted at the 3rd XAI4CV Workshop at CVPR 2024. Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurately predicting whether an image is private before sharing it online is difficult due to the vast variety of content and the subjective nature of privacy itself. In this paper, we evaluate privacy models that use objects extracted from an image to determine why the image is predicted as private. To explain the decision of these models, we use feature-attribution to identify and quantify which objects (and which of their features) are more relevant to privacy classification with respect to a reference input (i.e., no objects localised in an image) predicted as public. We show that the presence of the person category and its cardinality is the main factor for the privacy decision. Therefore, these models mostly fail to identify private images depicting documents with sensitive data, vehicle ownership, and internet activity, or public images with people (e.g., an outdoor concert or people walking in a public space next to a famous landmark). As baselines for future benchmarks, we also devise two strategies that are based on the person presence and cardinality and achieve comparable classification performance of the privacy models.

[63]  arXiv:2405.01649 [pdf, other]
Title: Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning
Comments: arXiv admin note: text overlap with arXiv:2305.01157, arXiv:2212.09567 by other authors
Subjects: Computation and Language (cs.CL)

Answering complex logical queries over incomplete knowledge graphs (KGs) is challenging. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propose a complex logical reasoning schema over knowledge graphs upon large language models (LLMs), containing a curriculum-based logical-aware instruction tuning framework, named LACT. Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition, to stimulate the reasoning capability of LLMs. To address the difficulty gap among different types of complex queries, we design a simple and flexible logic-aware curriculum learning framework. Experiments across widely used datasets demonstrate that LACT has substantial improvements~(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art. Our code and model will be released at GitHub and huggingface soon.

[64]  arXiv:2405.01652 [pdf, ps, other]
Title: On one-orbit cyclic subspace codes of $\mathcal{G}_q(n,3)$
Comments: Accepted to the 2024 IEEE International Symposium on Information Theory (ISIT 2024)
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

Subspace codes have recently been used for error correction in random network coding. In this work, we focus on one-orbit cyclic subspace codes. If $S$ is an $\mathbb{F}_q$-subspace of $\mathbb{F}_{q^n}$, then the one-orbit cyclic subspace code defined by $S$ is \[ \mathrm{Orb}(S)=\{\alpha S \colon \alpha \in \mathbb{F}_{q^n}^*\}, \]where $\alpha S=\lbrace \alpha s \colon s\in S\rbrace$ for any $\alpha\in \mathbb{F}_{q^n}^*$. Few classification results of subspace codes are known, therefore it is quite natural to initiate a classification of cyclic subspace codes, especially in the light of the recent classification of the isometries for cyclic subspace codes. We consider three-dimensional one-orbit cyclic subspace codes, which are divided into three families: the first one containing only $\mathrm{Orb}(\mathbb{F}_{q^3})$; the second one containing the optimum-distance codes; and the third one whose elements are codes with minimum distance $2$. We study inequivalent codes in the latter two families.

[65]  arXiv:2405.01654 [pdf, other]
Title: Key Patches Are All You Need: A Multiple Instance Learning Framework For Robust Medical Diagnosis
Comments: Accepted in DEF-AI-MIA Workshop@CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning models have revolutionized the field of medical image analysis, due to their outstanding performances. However, they are sensitive to spurious correlations, often taking advantage of dataset bias to improve results for in-domain data, but jeopardizing their generalization capabilities. In this paper, we propose to limit the amount of information these models use to reach the final classification, by using a multiple instance learning (MIL) framework. MIL forces the model to use only a (small) subset of patches in the image, identifying discriminative regions. This mimics the clinical procedures, where medical decisions are based on localized findings. We evaluate our framework on two medical applications: skin cancer diagnosis using dermoscopy and breast cancer diagnosis using mammography. Our results show that using only a subset of the patches does not compromise diagnostic performance for in-domain data, compared to the baseline approaches. However, our approach is more robust to shifts in patient demographics, while also providing more detailed explanations about which regions contributed to the decision. Code is available at: https://github.com/diogojpa99/MedicalMultiple-Instance-Learning.

[66]  arXiv:2405.01656 [pdf, other]
Title: S4: Self-Supervised Sensing Across the Spectrum
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Satellite image time series (SITS) segmentation is crucial for many applications like environmental monitoring, land cover mapping and agricultural crop type classification. However, training models for SITS segmentation remains a challenging task due to the lack of abundant training data, which requires fine grained annotation. We propose S4 a new self-supervised pre-training approach that significantly reduces the requirement for labeled training data by utilizing two new insights: (a) Satellites capture images in different parts of the spectrum such as radio frequencies, and visible frequencies. (b) Satellite imagery is geo-registered allowing for fine-grained spatial alignment. We use these insights to formulate pre-training tasks in S4. We also curate m2s2-SITS, a large-scale dataset of unlabeled, spatially-aligned, multi-modal and geographic specific SITS that serves as representative pre-training data for S4. Finally, we evaluate S4 on multiple SITS segmentation datasets and demonstrate its efficacy against competing baselines while using limited labeled data.

[67]  arXiv:2405.01660 [pdf, other]
Title: Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts
Comments: Accepted to *SEM 2024 (StarSEM) conference
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent Large Language Models (LLMs) have shown the ability to generate content that is difficult or impossible to distinguish from human writing. We investigate the ability of differently-sized LLMs to replicate human writing style in short, creative texts in the domain of Showerthoughts, thoughts that may occur during mundane activities. We compare GPT-2 and GPT-Neo fine-tuned on Reddit data as well as GPT-3.5 invoked in a zero-shot manner, against human-authored texts. We measure human preference on the texts across the specific dimensions that account for the quality of creative, witty texts. Additionally, we compare the ability of humans versus fine-tuned RoBERTa classifiers to detect AI-generated texts. We conclude that human evaluators rate the generated texts slightly worse on average regarding their creative quality, but they are unable to reliably distinguish between human-written and AI-generated texts. We further provide a dataset for creative, witty text generation based on Reddit Showerthoughts posts.

[68]  arXiv:2405.01661 [pdf, other]
Title: When a Relation Tells More Than a Concept: Exploring and Evaluating Classifier Decisions with CoReX
Comments: preliminary version, submitted to Machine Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Explanations for Convolutional Neural Networks (CNNs) based on relevance of input pixels might be too unspecific to evaluate which and how input features impact model decisions. Especially in complex real-world domains like biomedicine, the presence of specific concepts (e.g., a certain type of cell) and of relations between concepts (e.g., one cell type is next to another) might be discriminative between classes (e.g., different types of tissue). Pixel relevance is not expressive enough to convey this type of information. In consequence, model evaluation is limited and relevant aspects present in the data and influencing the model decisions might be overlooked. This work presents a novel method to explain and evaluate CNN models, which uses a concept- and relation-based explainer (CoReX). It explains the predictive behavior of a model on a set of images by masking (ir-)relevant concepts from the decision-making process and by constraining relations in a learned interpretable surrogate model. We test our approach with several image data sets and CNN architectures. Results show that CoReX explanations are faithful to the CNN model in terms of predictive outcomes. We further demonstrate that CoReX is a suitable tool for evaluating CNNs supporting identification and re-classification of incorrect or ambiguous classifications.

[69]  arXiv:2405.01662 [pdf, other]
Title: Out-of-distribution detection based on subspace projection of high-dimensional features output by the last convolutional layer
Authors: Qiuyu Zhu, Yiwei He
Comments: 10 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Out-of-distribution (OOD) detection, crucial for reliable pattern classification, discerns whether a sample originates outside the training distribution. This paper concentrates on the high-dimensional features output by the final convolutional layer, which contain rich image features. Our key idea is to project these high-dimensional features into two specific feature subspaces, leveraging the dimensionality reduction capacity of the network's linear layers, trained with Predefined Evenly-Distribution Class Centroids (PEDCC)-Loss. This involves calculating the cosines of three projection angles and the norm values of features, thereby identifying distinctive information for in-distribution (ID) and OOD data, which assists in OOD detection. Building upon this, we have modified the batch normalization (BN) and ReLU layer preceding the fully connected layer, diminishing their impact on the output feature distributions and thereby widening the distribution gap between ID and OOD data features. Our method requires only the training of the classification network model, eschewing any need for input pre-processing or specific OOD data pre-tuning. Extensive experiments on several benchmark datasets demonstrates that our approach delivers state-of-the-art performance. Our code is available at https://github.com/Hewell0/ProjOOD.

[70]  arXiv:2405.01663 [pdf, ps, other]
Title: ATNPA: A Unified View of Oversmoothing Alleviation in Graph Neural Networks
Comments: 16 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Oversmoothing is a commonly observed challenge in graph neural network (GNN) learning, where, as layers increase, embedding features learned from GNNs quickly become similar/indistinguishable, making them incapable of differentiating network proximity. A GNN with shallow layer architectures can only learn short-term relation or localized structure information, limiting its power of learning long-term connection, evidenced by their inferior learning performance on heterophilous graphs. Tackling oversmoothing is crucial to harness deep-layer architectures for GNNs. To date, many methods have been proposed to alleviate oversmoothing. The vast difference behind their design principles, combined with graph complications, make it difficult to understand and even compare their difference in tackling the oversmoothing. In this paper, we propose ATNPA, a unified view with five key steps: Augmentation, Transformation, Normalization, Propagation, and Aggregation, to summarize GNN oversmoothing alleviation approaches. We first outline three themes to tackle oversmoothing, and then separate all methods into six categories, followed by detailed reviews of representative methods, including their relation to the ATNPA, and discussion about their niche, strength, and weakness. The review not only draws in-depth understanding of existing methods in the field, but also shows a clear road map for future study.

[71]  arXiv:2405.01668 [pdf, other]
Title: WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Originating from semantic bugs, Entity-Inconsistency Bugs (EIBs) involve misuse of syntactically valid yet incorrect program entities, such as variable identifiers and function names, which often have security implications. Unlike straightforward syntactic vulnerabilities, EIBs are subtle and can remain undetected for years. Traditional detection methods, such as static analysis and dynamic testing, often fall short due to the versatile and context-dependent nature of EIBs. However, with advancements in Large Language Models (LLMs) like GPT-4, we believe LLM-powered automatic EIB detection becomes increasingly feasible through these models' semantics understanding abilities. This research first undertakes a systematic measurement of LLMs' capabilities in detecting EIBs, revealing that GPT-4, while promising, shows limited recall and precision that hinder its practical application. The primary problem lies in the model's tendency to focus on irrelevant code snippets devoid of EIBs. To address this, we introduce a novel, cascaded EIB detection system named WitheredLeaf, which leverages smaller, code-specific language models to filter out most negative cases and mitigate the problem, thereby significantly enhancing the overall precision and recall. We evaluated WitheredLeaf on 154 Python and C GitHub repositories, each with over 1,000 stars, identifying 123 new flaws, 45% of which can be exploited to disrupt the program's normal operations. Out of 69 submitted fixes, 27 have been successfully merged.

[72]  arXiv:2405.01673 [pdf, other]
Title: ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness
Comments: 21 pages, 13 figures
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

The ability to determine the pose of a rover in an inertial frame autonomously is a crucial capability necessary for the next generation of surface rover missions on other planetary bodies. Currently, most on-going rover missions utilize ground-in-the-loop interventions to manually correct for drift in the pose estimate and this human supervision bottlenecks the distance over which rovers can operate autonomously and carry out scientific measurements. In this paper, we present ShadowNav, an autonomous approach for global localization on the Moon with an emphasis on driving in darkness and at nighttime. Our approach uses the leading edge of Lunar craters as landmarks and a particle filtering approach is used to associate detected craters with known ones on an offboard map. We discuss the key design decisions in developing the ShadowNav framework for use with a Lunar rover concept equipped with a stereo camera and an external illumination source. Finally, we demonstrate the efficacy of our proposed approach in both a Lunar simulation environment and on data collected during a field test at Cinder Lakes, Arizona.

[73]  arXiv:2405.01674 [pdf, ps, other]
Title: Generative AI in Cybersecurity
Subjects: Cryptography and Security (cs.CR)

The dawn of Generative Artificial Intelligence (GAI), characterized by advanced models such as Generative Pre-trained Transformers (GPT) and other Large Language Models (LLMs), has been pivotal in reshaping the field of data analysis, pattern recognition, and decision-making processes. This surge in GAI technology has ushered in not only innovative opportunities for data processing and automation but has also introduced significant cybersecurity challenges.
As GAI rapidly progresses, it outstrips the current pace of cybersecurity protocols and regulatory frameworks, leading to a paradox wherein the same innovations meant to safeguard digital infrastructures also enhance the arsenal available to cyber criminals. These adversaries, adept at swiftly integrating and exploiting emerging technologies, may utilize GAI to develop malware that is both more covert and adaptable, thus complicating traditional cybersecurity efforts.
The acceleration of GAI presents an ambiguous frontier for cybersecurity experts, offering potent tools for threat detection and response, while concurrently providing cyber attackers with the means to engineer more intricate and potent malware. Through the joint efforts of Duke Pratt School of Engineering, Coalfire, and Safebreach, this research undertakes a meticulous analysis of how malicious agents are exploiting GAI to augment their attack strategies, emphasizing a critical issue for the integrity of future cybersecurity initiatives. The study highlights the critical need for organizations to proactively identify and develop more complex defensive strategies to counter the sophisticated employment of GAI in malware creation.

[74]  arXiv:2405.01675 [pdf, ps, other]
Title: Clones, closed categories, and combinatory logic
Authors: Philip Saville
Comments: A slightly-extended version of the paper published at Foundations of Software Science and Computation Structures (FoSSaCS) 2024
Journal-ref: In: Kobayashi, N., Worrell, J. (eds) Foundations of Software Science and Computation Structures. FoSSaCS 2024. Lecture Notes in Computer Science, vol 14575. Springer, Cham
Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT)

We give an exposition of the semantics of the simply-typed lambda-calculus, and its linear and ordered variants, using multi-ary structures. We define universal properties for multicategories, and use these to derive familiar rules for products, tensors, and exponentials. Finally we explain how to recover both the category-theoretic syntactic model and its semantic interpretation from the multi-ary framework.
We then use these ideas to study the semantic interpretation of combinatory logic and the simply-typed lambda-calculus without products. We introduce extensional SK-clones and show these are sound and complete for both combinatory logic with extensional weak equality and the simply-typed lambda-calculus without products. We then show such SK-clones are equivalent to a variant of closed categories called SK-categories, so the simply-typed lambda-calculus without products is the internal language of SK-categories. As a corollary, we deduce that SK-categories have the same relationship to cartesian monoidal categories that closed categories have to monoidal categories.

[75]  arXiv:2405.01677 [pdf, other]
Title: Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. Initially, we analyze the conflict between reward and safety gradients. Subsequently, we tackle the balance between reward and safety optimization by proposing a soft switching policy optimization method, for which we provide convergence analysis. Based on our theoretical examination, we provide a safe RL framework to overcome the aforementioned challenge, and we develop a Safety-MuJoCo Benchmark to assess the performance of safe RL algorithms. Finally, we evaluate the effectiveness of our method on the Safety-MuJoCo Benchmark and a popular safe benchmark, Omnisafe. Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.

[76]  arXiv:2405.01678 [pdf, other]
Title: 1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy
Comments: 12 pages, 7 figures, 7 tables, 10th ACM International Workshop on Security and Privacy Analytics (IWSPA 2024)
Subjects: Computation and Language (cs.CL)

The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word $\textit{perturbations}$. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing $\texttt{1-Diffractor}$, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate $\texttt{1-Diffractor}$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. $\texttt{1-Diffractor}$ shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.

[77]  arXiv:2405.01680 [pdf, other]
Title: Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations
Comments: Accepted at IJCAI 2024
Subjects: Machine Learning (cs.LG)

The residual loss in Physics-Informed Neural Networks (PINNs) alters the simple recursive relation of layers in a feed-forward neural network by applying a differential operator, resulting in a loss landscape that is inherently different from those of common supervised problems. Therefore, relying on the existing theory leads to unjustified design choices and suboptimal performance. In this work, we analyze the residual loss by studying its characteristics at critical points to find the conditions that result in effective training of PINNs. Specifically, we first show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network. Furthermore, our analysis also reveals that an activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss. In particular, to solve a $k$-th order PDE, the $k$-th derivative of the activation function should be bijective. The established theory paves the way for designing and choosing effective activation functions for PINNs and explains why periodic activations have shown promising performance in certain cases. Finally, we verify our findings by conducting a set of experiments on several PDEs. Our code is publicly available at https://github.com/nimahsn/pinns_tf2.

[78]  arXiv:2405.01681 [pdf, other]
Title: Accounting for the Effects of Probabilistic Uncertainty During Fast Charging of Lithium-ion Batteries
Comments: 6 pages, 5 figures, accepted for ACC 2024
Subjects: Systems and Control (eess.SY)

Batteries are nonlinear dynamical systems that can be modeled by Porous Electrode Theory models. The aim of optimal fast charging is to reduce the charging time while keeping battery degradation low. Most past studies assume that model parameters and ambient temperature are a fixed known value and that all PET model parameters are perfectly known. In real battery operation, however, the ambient temperature and the model parameters are uncertain. To ensure that operational constraints are satisfied at all times in the context of model-based optimal control, uncertainty quantification is required. Here, we analyze optimal fast charging for modest uncertainty in the ambient temperature and 23 model parameters. Uncertainty quantification of the battery model is carried out using non-intrusive polynomial chaos expansion and the results are verified with Monte Carlo simulations. The method is investigated for a constant current--constant voltage charging strategy for a battery for which the strategy is known to be standard for fast charging subject to operating below maximum current and charging constraints. Our results demonstrate that uncertainty in ambient temperature results in violations of constraints on the voltage and temperature. Our results identify a subset of key parameters that contribute to fast charging among the overall uncertain parameters. Additionally, it is shown that the constraints represented by voltage, temperature, and lithium-plating overpotential are violated due to uncertainties in the ambient temperature and parameters. The C-rate and charge constraints are then adjusted so that the probability of violating the degradation acceleration condition is below a pre-specified value. This approach demonstrates a computationally efficient approach for determining fast-charging protocols that take probabilistic uncertainties into account.

[79]  arXiv:2405.01682 [pdf, other]
Title: Leveraging Prompt-Learning for Structured Information Extraction from Crohn's Disease Radiology Reports in a Low-Resource Language
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Automatic conversion of free-text radiology reports into structured data using Natural Language Processing (NLP) techniques is crucial for analyzing diseases on a large scale. While effective for tasks in widely spoken languages like English, generative large language models (LLMs) typically underperform with less common languages and can pose potential risks to patient privacy. Fine-tuning local NLP models is hindered by the skewed nature of real-world medical datasets, where rare findings represent a significant data imbalance. We introduce SMP-BERT, a novel prompt learning method that leverages the structured nature of reports to overcome these challenges. In our studies involving a substantial collection of Crohn's disease radiology reports in Hebrew (over 8,000 patients and 10,000 reports), SMP-BERT greatly surpassed traditional fine-tuning methods in performance, notably in detecting infrequent conditions (AUC: 0.99 vs 0.94, F1: 0.84 vs 0.34). SMP-BERT empowers more accurate AI diagnostics available for low-resource languages.

[80]  arXiv:2405.01684 [pdf, other]
Title: Intelligent Switching for Reset-Free RL
Comments: Published at ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The \textit{resetting} assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (\textit{forward}) with learned resets by constructing a second (\textit{backward}) agent that returns the forward agent to the initial state. We find that the termination and timing of the transitions between these two agents are crucial for algorithm success. With this in mind, we create a new algorithm, Reset Free RL with Intelligently Switching Controller (RISC) which intelligently switches between the two agents based on the agent's confidence in achieving its current goal. Our new method achieves state-of-the-art performance on several challenging environments for reset-free RL.

[81]  arXiv:2405.01686 [pdf, other]
Title: Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models
Comments: 24 pages, 7 figures, 6 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individual trials to be synthesized. Ideally, language technologies would permit fully automatic meta-analysis, on demand. This requires accurately extracting numerical results from individual trials, which has been beyond the capabilities of natural language processing (NLP) models to date. In this work, we evaluate whether modern large language models (LLMs) can reliably perform this task. We annotate (and release) a modest but granular evaluation dataset of clinical trial reports with numerical findings attached to interventions, comparators, and outcomes. Using this dataset, we evaluate the performance of seven LLMs applied zero-shot for the task of conditionally extracting numerical findings from trial reports. We find that massive LLMs that can accommodate lengthy inputs are tantalizingly close to realizing fully automatic meta-analysis, especially for dichotomous (binary) outcomes (e.g., mortality). However, LLMs -- including ones trained on biomedical texts -- perform poorly when the outcome measures are complex and tallying the results requires inference. This work charts a path toward fully automatic meta-analysis of RCTs via LLMs, while also highlighting the limitations of existing models for this aim.

[82]  arXiv:2405.01687 [pdf, ps, other]
Title: Compactness via Pattern Stepping Bisimulation
Authors: Matias Scharager
Subjects: Programming Languages (cs.PL)

The compactness lemma in programming language theory states that any recursive function can be simulated by a finite unrolling of the function. One important use case it has is in the logical relations proof technique for proving properties of typed programs, such as strong normalization. The relation between recursive functions and their finite counterparts is a special variant of the class of bisimulation relations. However, standard bisimulation proof approaches do not apply to the compactness lemma as properties of the relation vary over execution. As a result, the proof of compactness is often messy because the multiple copies made of the recursive function during execution can be unrolled an inconsistent number of times. We present a new proof technique by indexing the bisimulation relation over the step transitions and utilizing an intermediate "pattern" language to mechanize bookkeeping. This generalization of "pattern stepping bisimulation" obviates the need for contextual approximation within the compactness lemma, and thus extends the compactness lemma to a wider range of programming languages, including those that incorporate control flow effects. We demonstrate this approach by formally verifying the compactness lemma within the Coq theorem prover in the setting of explicit control flow and polymorphism.

[83]  arXiv:2405.01688 [pdf, other]
Title: Adapting Self-Supervised Learning for Computational Pathology
Comments: Presented at DCA in MI Workshop, CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Self-supervised learning (SSL) has emerged as a key technique for training networks that can generalize well to diverse tasks without task-specific supervision. This property makes SSL desirable for computational pathology, the study of digitized images of tissues, as there are many target applications and often limited labeled training samples. However, SSL algorithms and models have been primarily developed in the field of natural images and whether their performance can be improved by adaptation to particular domains remains an open question. In this work, we present an investigation of modifications to SSL for pathology data, specifically focusing on the DINOv2 algorithm. We propose alternative augmentations, regularization functions, and position encodings motivated by the characteristics of pathology images. We evaluate the impact of these changes on several benchmarks to demonstrate the value of tailored approaches.

[84]  arXiv:2405.01689 [pdf, other]
Title: Investigation on optimal microstructure of dual-phase steel with high strength and ductility by machine learning
Comments: 27 pages, 23 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE)

In this study, we developed an inverse analysis framework that proposes a microstructure for dual-phase (DP) steel that exhibits high strength and ductility. The inverse analysis method proposed in this study involves repeated random searches on a model that combines a generative adversarial network (GAN), which generates microstructures, and a convolutional neural network (CNN), which predicts the maximum stress and working limit strain from DP steel microstructures. GAN was trained using images of DP steel microstructures generated by the phase-field method. CNN was trained using images of DP steel microstructures, the maximum stress and the working limit strain calculated by the dislocation-crystal plasticity finite element method. The constructed framework made an efficient search for microstructures possible because of a low-dimensional search space by a latent variable of GAN. The multiple deformation modes were considered in this framework, which allowed the required microstructures to be explored under complex deformation modes. A microstructure with a fine grain size was proposed by using the developed framework.

[85]  arXiv:2405.01690 [pdf, other]
Title: Addressing the Load Estimation Problem: Cell Switching in HAPS-Assisted Sustainable 6G Networks
Comments: arXiv admin note: substantial text overlap with arXiv:2402.04386
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

This study aims to introduce and address the problem of traffic load estimation in the cell switching concept within the evolving landscape of vertical heterogeneous networks (vHetNets). The problem is that the practice of cell switching faces a significant challenge due to the lack of accurate data on the traffic load of sleeping small base stations (SBSs). This problem makes the majority of the studies in the literature, particularly those employing load-dependent approaches, impractical due to their basic assumption of perfect knowledge of the traffic loads of sleeping SBSs for the next time slot. Rather than developing another advanced cell switching algorithm, this study investigates the impacts of estimation errors and explores possible solutions through established methodologies in a novel vHetNet environment that includes the integration of a high altitude platform (HAPS) as a super macro base station (SMBS) into the terrestrial network. In other words, this study adopts a more foundational perspective, focusing on eliminating a significant obstacle for the application of advanced cell switching algorithms. To this end, we explore the potential of three distinct spatial interpolation-based estimation schemes: random neighboring selection, distance-based selection, and clustering-based selection. Utilizing a real dataset for empirical validations, we evaluate the efficacy of our proposed traffic load estimation schemes. Our results demonstrate that the multi-level clustering (MLC) algorithm performs exceptionally well, with an insignificant difference (i.e., 0.8%) observed between its estimated and actual network power consumption, highlighting its potential to significantly improve energy efficiency in vHetNets.

[86]  arXiv:2405.01691 [pdf, other]
Title: Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving
Comments: Presented at the Robot Trust for Symbiotic Societies (RTSS) Workshop, co-located with ICRA 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)

Out-of-distribution (OOD) detection is essential in autonomous driving, to determine when learning-based components encounter unexpected inputs. Traditional detectors typically use encoder models with fixed settings, thus lacking effective human interaction capabilities. With the rise of large foundation models, multimodal inputs offer the possibility of taking human language as a latent representation, thus enabling language-defined OOD detection. In this paper, we use the cosine similarity of image and text representations encoded by the multimodal model CLIP as a new representation to improve the transparency and controllability of latent encodings used for visual anomaly detection. We compare our approach with existing pre-trained encoders that can only produce latent representations that are meaningless from the user's standpoint. Our experiments on realistic driving data show that the language-based latent representation performs better than the traditional representation of the vision encoder and helps improve the detection performance when combined with standard representations.

[87]  arXiv:2405.01693 [pdf, other]
Title: Adversarial Attacks on Reinforcement Learning Agents for Command and Control
Subjects: Cryptography and Security (cs.CR)

Given the recent impact of Deep Reinforcement Learning in training agents to win complex games like StarCraft and DoTA(Defense Of The Ancients) - there has been a surge in research for exploiting learning based techniques for professional wargaming, battlefield simulation and modeling. Real time strategy games and simulators have become a valuable resource for operational planning and military research. However, recent work has shown that such learning based approaches are highly susceptible to adversarial perturbations. In this paper, we investigate the robustness of an agent trained for a Command and Control task in an environment that is controlled by an active adversary. The C2 agent is trained on custom StarCraft II maps using the state of the art RL algorithms - A3C and PPO. We empirically show that an agent trained using these algorithms is highly susceptible to noise injected by the adversary and investigate the effects these perturbations have on the performance of the trained agent. Our work highlights the urgent need to develop more robust training algorithms especially for critical arenas like the battlefield.

[88]  arXiv:2405.01695 [pdf, other]
Title: Requirements-driven Slicing of Simulink Models Using LLMs
Comments: This paper will appear at the 11th International Workshop on Artificial Intelligence and Requirements Engineering (AIRE'24)
Subjects: Software Engineering (cs.SE)

Model slicing is a useful technique for identifying a subset of a larger model that is relevant to fulfilling a given requirement. Notable applications of slicing include reducing inspection effort when checking design adequacy to meet requirements of interest and when conducting change impact analysis. In this paper, we present a method based on large language models (LLMs) for extracting model slices from graphical Simulink models. Our approach converts a Simulink model into a textual representation, uses an LLM to identify the necessary Simulink blocks for satisfying a specific requirement, and constructs a sound model slice that incorporates the blocks identified by the LLM. We explore how different levels of granularity (verbosity) in transforming Simulink models into textual representations, as well as the strategy used to prompt the LLM, impact the accuracy of the generated slices. Our preliminary findings suggest that prompts created by textual representations that retain the syntax and semantics of Simulink blocks while omitting visual rendering information of Simulink models yield the most accurate slices. Furthermore, the chain-of-thought and zero-shot prompting strategies result in the largest number of accurate model slices produced by our approach.

[89]  arXiv:2405.01697 [pdf, other]
Title: Towards an Ethical and Inclusive Implementation of Artificial Intelligence in Organizations: A Multidimensional Framework
Comments: This is an English version of the original article arXiv:2405.00225v1 [cs.CY] (Hacia una implementaci\'on \'etica e inclusiva de la Inteligencia Artificial en las organizaciones: un marco multidimensional)
Subjects: Computers and Society (cs.CY)

This article analyzes the impact of artificial intelligence (AI) on contemporary society and the importance of adopting an ethical approach to its development and implementation within organizations. It examines the technocritical perspective of some philosophers and researchers, who warn of the risks of excessive technologization that could undermine human autonomy. However, the article also acknowledges the active role that various actors, such as governments, academics, and civil society, can play in shaping the development of AI aligned with human and social values.
A multidimensional approach is proposed that combines ethics with regulation, innovation, and education. It highlights the importance of developing detailed ethical frameworks, incorporating ethics into the training of professionals, conducting ethical impact audits, and encouraging the participation of stakeholders in the design of AI.
In addition, four fundamental pillars are presented for the ethical implementation of AI in organizations: 1) Integrated values, 2) Trust and transparency, 3) Empowering human growth, and 4) Identifying strategic factors. These pillars encompass aspects such as alignment with the company's ethical identity, governance and accountability, human-centered design, continuous training, and adaptability to technological and market changes.
The conclusion emphasizes that ethics must be the cornerstone of any organization's strategy that seeks to incorporate AI, establishing a solid framework that ensures that technology is developed and used in a way that respects and promotes human values.

[90]  arXiv:2405.01699 [pdf, other]
Title: SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients
Comments: 7 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Small object detection in aerial imagery presents significant challenges in computer vision due to the minimal data inherent in small-sized objects and their propensity to be obscured by larger objects and background noise. Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases, which adversely affect their performance with objects of varying orientations and scales. This underscores the need for more adaptable, lightweight models. In response, this paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects. Firstly, we explore the use of the SAHI framework on the newly introduced lightweight YOLO v9 architecture, which utilizes Programmable Gradient Information (PGI) to reduce the substantial information loss typically encountered in sequential feature extraction processes. The paper employs the Vision Mamba model, which incorporates position embeddings to facilitate precise location-aware visual understanding, combined with a novel bidirectional State Space Model (SSM) for effective visual context modeling. This State Space Model adeptly harnesses the linear complexity of CNNs and the global receptive field of Transformers, making it particularly effective in remote sensing image classification. Our experimental results demonstrate substantial improvements in detection accuracy and processing efficiency, validating the applicability of these approaches for real-time small object detection across diverse aerial scenarios. This paper also discusses how these methodologies could serve as foundational models for future advancements in aerial object recognition technologies. The source code will be made accessible here.

[91]  arXiv:2405.01701 [pdf, ps, other]
Title: Active Learning Enabled Low-cost Cell Image Segmentation Using Bounding Box Annotation
Authors: Yu Zhu, Qiang Yang, Li Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cell image segmentation is usually implemented using fully supervised deep learning methods, which heavily rely on extensive annotated training data. Yet, due to the complexity of cell morphology and the requirement for specialized knowledge, pixel-level annotation of cell images has become a highly labor-intensive task. To address the above problems, we propose an active learning framework for cell segmentation using bounding box annotations, which greatly reduces the data annotation cost of cell segmentation algorithms. First, we generate a box-supervised learning method (denoted as YOLO-SAM) by combining the YOLOv8 detector with the Segment Anything Model (SAM), which effectively reduces the complexity of data annotation. Furthermore, it is integrated into an active learning framework that employs the MC DropBlock method to train the segmentation model with fewer box-annotated samples. Extensive experiments demonstrate that our model saves more than ninety percent of data annotation time compared to mask-supervised deep learning methods.

[92]  arXiv:2405.01702 [pdf, other]
Title: Optimization without retraction on the random generalized Stiefel manifold
Comments: 21 pages, 10 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Optimization over the set of matrices that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods, such as Riemannian approaches, which require a computationally expensive eigenvalue decomposition involving fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to a random estimate of the feasible set. Our method does not enforce the constraint in every iteration exactly, but instead it produces iterations that converge to a critical point on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian counterparts involving the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and GEVP.

[93]  arXiv:2405.01704 [pdf, other]
Title: Privacy-aware Berrut Approximated Coded Computing for Federated Learning
Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

Federated Learning (FL) is an interesting strategy that enables the collaborative training of an AI model among different data owners without revealing their private datasets. Even so, FL has some privacy vulnerabilities that have been tried to be overcome by applying some techniques like Differential Privacy (DP), Homomorphic Encryption (HE), or Secure Multi-Party Computation (SMPC). However, these techniques have some important drawbacks that might narrow their range of application: problems to work with non-linear functions and to operate large matrix multiplications and high communication and computational costs to manage semi-honest nodes. In this context, we propose a solution to guarantee privacy in FL schemes that simultaneously solves the previously mentioned problems. Our proposal is based on the Berrut Approximated Coded Computing, a technique from the Coded Distributed Computing paradigm, adapted to a Secret Sharing configuration, to provide input privacy to FL in a scalable way. It can be applied for computing non-linear functions and treats the special case of distributed matrix multiplication, a key primitive at the core of many automated learning tasks. Because of these characteristics, it could be applied in a wide range of FL scenarios, since it is independent of the machine learning models or aggregation algorithms used in the FL scheme. We provide analysis of the achieve privacy and complexity of our solution and, due to the extensive numerical results performed, it can be observed a good trade-off between privacy and precision.

[94]  arXiv:2405.01705 [pdf, other]
Title: Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Image and multimodal machine learning tasks are very challenging to solve in the case of poorly distributed data. In particular, data availability and privacy restrictions exacerbate these hurdles in the medical domain. The state of the art in image generation quality is held by Latent Diffusion models, making them prime candidates for tackling this problem. However, a few key issues still need to be solved, such as the difficulty in generating data from under-represented classes and a slow inference process. To mitigate these issues, we propose a new method for image augmentation in long-tailed data based on leveraging the rich latent space of pre-trained Stable Diffusion Models. We create a modified separable latent space to mix head and tail class examples. We build this space via Iterated Learning of underlying sparsified embeddings, which we apply to task-specific saliency maps via a K-NN approach. Code is available at https://github.com/SugarFreeManatee/Feature-Space-Augmentation-and-Iterated-Learning

[95]  arXiv:2405.01707 [pdf, ps, other]
Title: Stability of the Ghurye-Olkin Characterization of Vector Gaussian Distributions
Subjects: Information Theory (cs.IT)

The stability of the Ghurye-Olkin (GO) characterization of Gaussian vectors is analyzed using a partition of the vectors into equivalence classes defined by their matrix factors. The sum of the vectors in each class is near-Gaussian in the characteristic function (c.f.) domain if the GO independence condition is approximately met in the c.f. domain. All vectors have the property that any vector projection is near-Gaussian in the distribution function (d.f.) domain. The proofs of these c.f. and d.f. stabilities use tools that establish the stabilities of theorems by Kac-Bernstein and Cram\'er, respectively. The results are used to prove stability theorems for differential entropies of Gaussian vectors and blind source separation of non-Gaussian sources.

[96]  arXiv:2405.01708 [pdf, other]
Title: A deep causal inference model for fully-interpretable travel behaviour analysis
Subjects: Machine Learning (cs.LG)

Transport policy assessment often involves causal questions, yet the causal inference capabilities of traditional travel behavioural models are at best limited. We present the deep CAusal infeRence mOdel for traveL behavIour aNAlysis (CAROLINA), a framework that explicitly models causality in travel behaviour, enhances predictive accuracy, and maintains interpretability by leveraging causal inference, deep learning, and traditional discrete choice modelling. Within this framework, we introduce a Generative Counterfactual model for forecasting human behaviour by adapting the Normalizing Flow method. Through the case studies of virtual reality-based pedestrian crossing behaviour, revealed preference travel behaviour from London, and synthetic data, we demonstrate the effectiveness of our proposed models in uncovering causal relationships, prediction accuracy, and assessing policy interventions. Our results show that intervention mechanisms that can reduce pedestrian stress levels lead to a 38.5% increase in individuals experiencing shorter waiting times. Reducing the travel distances in London results in a 47% increase in sustainable travel modes.

[97]  arXiv:2405.01711 [pdf, ps, other]
Title: Individual Fairness Through Reweighting and Tuning
Comments: 14 pages, 1 figure, and 2 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Inherent bias within society can be amplified and perpetuated by artificial intelligence (AI) systems. To address this issue, a wide range of solutions have been proposed to identify and mitigate bias and enforce fairness for individuals and groups. Recently, Graph Laplacian Regularizer (GLR), a regularization technique from the semi-supervised learning literature has been used as a substitute for the common Lipschitz condition to enhance individual fairness (IF). Notable prior work has shown that enforcing IF through a GLR can improve the transfer learning accuracy of AI models under covariate shifts. However, the prior work defines a GLR on the source and target data combined, implicitly assuming that the target data are available at train time, which might not hold in practice. In this work, we investigated whether defining a GLR independently on the train and target data could maintain similar accuracy compared to the prior work model. Furthermore, we introduced the Normalized Fairness Gain score (FGN) to measure IF for in-processing algorithmic fairness techniques. FGN quantifies the amount of gained fairness when a GLR is used versus not. We evaluated the new and original methods under FGN, the Prediction Consistency (PC), and traditional classification metrics on the German Credit Approval dataset. The results showed that the two models achieved similar statistical mean performances over five-fold cross-validation. Furthermore, the proposed metric showed that PC scores can be misleading as the scores can be high and statistically similar to fairness-enhanced models while FGN scores are small. This work therefore provides new insights into when a GLR effectively enhances IF and the pitfalls of PC.

[98]  arXiv:2405.01713 [pdf, other]
Title: SUNDIALS Time Integrators for Exascale Applications with Many Independent ODE Systems
Subjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC)

Many complex systems can be accurately modeled as a set of coupled time-dependent partial differential equations (PDEs). However, solving such equations can be prohibitively expensive, easily taxing the world's largest supercomputers. One pragmatic strategy for attacking such problems is to split the PDEs into components that can more easily be solved in isolation. This operator splitting approach is used ubiquitously across scientific domains, and in many cases leads to a set of ordinary differential equations (ODEs) that need to be solved as part of a larger "outer-loop" time-stepping approach. The SUNDIALS library provides a plethora of robust time integration algorithms for solving ODEs, and the U.S. Department of Energy Exascale Computing Project (ECP) has supported its extension to applications on exascale-capable computing hardware. In this paper, we highlight some SUNDIALS capabilities and its deployment in combustion and cosmology application codes (Pele and Nyx, respectively) where operator splitting gives rise to numerous, small ODE systems that must be solved concurrently.

[99]  arXiv:2405.01714 [pdf, other]
Title: Interpretable Vital Sign Forecasting with Model Agnostic Attention Maps
Comments: 8 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Sepsis is a leading cause of mortality in intensive care units (ICUs), representing a substantial medical challenge. The complexity of analyzing diverse vital signs to predict sepsis further aggravates this issue. While deep learning techniques have been advanced for early sepsis prediction, their 'black-box' nature obscures the internal logic, impairing interpretability in critical settings like ICUs. This paper introduces a framework that combines a deep learning model with an attention mechanism that highlights the critical time steps in the forecasting process, thus improving model interpretability and supporting clinical decision-making. We show that the attention mechanism could be adapted to various black box time series forecasting models such as N-HiTS and N-BEATS. Our method preserves the accuracy of conventional deep learning models while enhancing interpretability through attention-weight-generated heatmaps. We evaluated our model on the eICU-CRD dataset, focusing on forecasting vital signs for sepsis patients. We assessed its performance using mean squared error (MSE) and dynamic time warping (DTW) metrics. We explored the attention maps of N-HiTS and N-BEATS, examining the differences in their performance and identifying crucial factors influencing vital sign forecasting.

[100]  arXiv:2405.01716 [pdf, other]
Title: ATTAXONOMY: Unpacking Differential Privacy Guarantees Against Practical Adversaries
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Differential Privacy (DP) is a mathematical framework that is increasingly deployed to mitigate privacy risks associated with machine learning and statistical analyses. Despite the growing adoption of DP, its technical privacy parameters do not lend themselves to an intelligible description of the real-world privacy risks associated with that deployment: the guarantee that most naturally follows from the DP definition is protection against membership inference by an adversary who knows all but one data record and has unlimited auxiliary knowledge. In many settings, this adversary is far too strong to inform how to set real-world privacy parameters.
One approach for contextualizing privacy parameters is via defining and measuring the success of technical attacks, but doing so requires a systematic categorization of the relevant attack space. In this work, we offer a detailed taxonomy of attacks, showing the various dimensions of attacks and highlighting that many real-world settings have been understudied. Our taxonomy provides a roadmap for analyzing real-world deployments and developing theoretical bounds for more informative privacy attacks. We operationalize our taxonomy by using it to analyze a real-world case study, the Israeli Ministry of Health's recent release of a birth dataset using DP, showing how the taxonomy enables fine-grained threat modeling and provides insight towards making informed privacy parameter choices. Finally, we leverage the taxonomy towards defining a more realistic attack than previously considered in the literature, namely a distributional reconstruction attack: we generalize Balle et al.'s notion of reconstruction robustness to a less-informed adversary with distributional uncertainty, and extend the worst-case guarantees of DP to this average-case setting.

[101]  arXiv:2405.01717 [pdf, other]
Title: FSM Builder: A Tool for Writing Autograded Finite Automata Questions
Comments: 7 pages
Subjects: Computers and Society (cs.CY); Formal Languages and Automata Theory (cs.FL)

Deterministic and nondeterministic finite automata (DFAs and NFAs) are abstract models of computation commonly taught in introductory computing theory courses. These models have important applications (such as fast regular expression matching), and are used to introduce formal language theory. Undergraduate students often struggle with understanding these models at first, due to the level of abstraction. As a result, various pedagogical tools have been developed to allow students to practice with these models. We introduce the FSM Builder, a new pedagogical tool enabling students to practice constructing DFAs and NFAs with a graphical editor, giving personalized feedback and partial credit. The algorithms used for generating these are heavily inspired by previous works. The key advantages to its competitors are greater flexibility and scalability. This is because the FSM Builder is implemented using efficient algorithms from an open source package, allowing for easy extension and question creation. We discuss the implementation of the tool, how it stands out from previous tools, and takeaways from experiences of using the tool in multiple large courses. Survey results indicate the interface and feedback provided by the tool were useful to students.

[102]  arXiv:2405.01718 [pdf, other]
Title: Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk
Authors: Xinyi Ni, Lifeng Lai
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goal of minimizing expected total discounted costs, in this paper, we analyze the robustness of CVaR-based risk-sensitive RL under RMDP. Firstly, we consider predetermined ambiguity sets. Based on the coherency of CVaR, we establish a connection between robustness and risk sensitivity, thus, techniques in risk-sensitive RL can be adopted to solve the proposed problem. Furthermore, motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets. To solve this, we define a new risk measure named NCVaR and build the equivalence of NCVaR optimization and robust CVaR optimization. We further propose value iteration algorithms and validate our approach in simulation experiments.

[103]  arXiv:2405.01719 [pdf, other]
Title: Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmark
Comments: To be published in ICML 2024
Subjects: Machine Learning (cs.LG)

We examine multi-task benchmarks in machine learning through the lens of social choice theory. We draw an analogy between benchmarks and electoral systems, where models are candidates and tasks are voters. This suggests a distinction between cardinal and ordinal benchmark systems. The former aggregate numerical scores into one model ranking; the latter aggregate rankings for each task. We apply Arrow's impossibility theorem to ordinal benchmarks to highlight the inherent limitations of ordinal systems, particularly their sensitivity to the inclusion of irrelevant models. Inspired by Arrow's theorem, we empirically demonstrate a strong trade-off between diversity and sensitivity to irrelevant changes in existing multi-task benchmarks. Our result is based on new quantitative measures of diversity and sensitivity that we introduce. Sensitivity quantifies the impact that irrelevant changes to tasks have on a benchmark. Diversity captures the degree of disagreement in model rankings across tasks. We develop efficient approximation algorithms for both measures, as exact computation is computationally challenging. Through extensive experiments on seven cardinal benchmarks and eleven ordinal benchmarks, we demonstrate a clear trade-off between diversity and stability: The more diverse a multi-task benchmark, the more sensitive to trivial changes it is. Additionally, we show that the aggregated rankings of existing benchmarks are highly unstable under irrelevant changes. The codes and data are available at https://socialfoundations.github.io/benchbench/.

[104]  arXiv:2405.01723 [pdf, other]
Title: Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep Learning with Geometric Motion Model Fusion
Comments: Accepted by the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Detecting and segmenting moving objects from a moving monocular camera is challenging in the presence of unknown camera motion, diverse object motions and complex scene structures. Most existing methods rely on a single motion cue to perform motion segmentation, which is usually insufficient when facing different complex environments. While a few recent deep learning based methods are able to combine multiple motion cues to achieve improved accuracy, they depend heavily on vast datasets and extensive annotations, making them less adaptable to new scenarios. To address these limitations, we propose a novel monocular dense segmentation method that achieves state-of-the-art motion segmentation results in a zero-shot manner. The proposed method synergestically combines the strengths of deep learning and geometric model fusion methods by performing geometric model fusion on object proposals. Experiments show that our method achieves competitive results on several motion segmentation datasets and even surpasses some state-of-the-art supervised methods on certain benchmarks, while not being trained on any data. We also present an ablation study to show the effectiveness of combining different geometric models together for motion segmentation, highlighting the value of our geometric model fusion strategy.

[105]  arXiv:2405.01724 [pdf, other]
Title: Large Language Models are Inconsistent and Biased Evaluators
Comments: 9 pages, 7 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The zero-shot capability of Large Language Models (LLMs) has enabled highly flexible, reference-free metrics for various tasks, making LLM evaluators common tools in NLP. However, the robustness of these LLM evaluators remains relatively understudied; existing work mainly pursued optimal performance in terms of correlating LLM scores with human expert scores. In this paper, we conduct a series of analyses using the SummEval dataset and confirm that LLMs are biased evaluators as they: (1) exhibit familiarity bias-a preference for text with lower perplexity, (2) show skewed and biased distributions of ratings, and (3) experience anchoring effects for multi-attribute judgments. We also found that LLMs are inconsistent evaluators, showing low "inter-sample" agreement and sensitivity to prompt differences that are insignificant to human understanding of text quality. Furthermore, we share recipes for configuring LLM evaluators to mitigate these limitations. Experimental results on the RoSE dataset demonstrate improvements over the state-of-the-art LLM evaluators.

[106]  arXiv:2405.01728 [pdf, other]
Title: Explainability Guided Adversarial Evasion Attacks on Malware Detectors
Subjects: Cryptography and Security (cs.CR)

As the focus on security of Artificial Intelligence (AI) is becoming paramount, research on crafting and inserting optimal adversarial perturbations has become increasingly critical. In the malware domain, this adversarial sample generation relies heavily on the accuracy and placement of crafted perturbation with the goal of evading a trained classifier. This work focuses on applying explainability techniques to enhance the adversarial evasion attack on a machine-learning-based Windows PE malware detector. The explainable tool identifies the regions of PE malware files that have the most significant impact on the decision-making process of a given malware detector, and therefore, the same regions can be leveraged to inject the adversarial perturbation for maximum efficiency. Profiling all the PE malware file regions based on their impact on the malware detector's decision enables the derivation of an efficient strategy for identifying the optimal location for perturbation injection. The strategy should incorporate the region's significance in influencing the malware detector's decision and the sensitivity of the PE malware file's integrity towards modifying that region. To assess the utility of explainable AI in crafting an adversarial sample of Windows PE malware, we utilize the DeepExplainer module of SHAP for determining the contribution of each region of PE malware to its detection by a CNN-based malware detector, MalConv. Furthermore, we analyzed the significance of SHAP values at a more granular level by subdividing each section of Windows PE into small subsections. We then performed an adversarial evasion attack on the subsections based on the corresponding SHAP values of the byte sequences.

[107]  arXiv:2405.01731 [pdf, other]
Title: Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization
Comments: Accepted to ICML2024
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization by accounting for the heterogeneous curvature of the objective function. The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum. This approach significantly reduces the error in estimating the gradient from noisy evaluations through sampling. We demonstrate the efficacy of our method through numerical experiments on artificial problems. Additionally, we show improved performance when tuning NP-hard combinatorial optimization solvers compared to existing state-of-the-art heuristic derivative-free and Bayesian optimization methods.

[108]  arXiv:2405.01733 [pdf, ps, other]
Title: Rings with common division, common meadows and their conditional equational theories
Subjects: Logic in Computer Science (cs.LO); Symbolic Computation (cs.SC)

We examine the consequences of having a total division operation $\frac{x}{y}$ on commutative rings. We consider two forms of binary division, one derived from a unary inverse, the other defined directly as a general operation; each are made total by setting $1/0$ equal to an error value $\bot$, which is added to the ring. Such totalised divisions we call common divisions. In a field the two forms are equivalent and we have a finite equational axiomatisation $E$ that is complete for the equational theory of fields equipped with common division, called common meadows. These equational axioms $E$ turn out to be true of commutative rings with common division but only when defined via inverses. We explore these axioms $E$ and their role in seeking a completeness theorem for the conditional equational theory of common meadows. We prove they are complete for the conditional equational theory of commutative rings with inverse based common division. By adding a new proof rule, we can prove a completeness theorem for the conditional equational theory of common meadows. Although, the equational axioms $E$ fail with common division defined directly, we observe that the direct division does satisfies the equations in $E$ under a new congruence for partial terms called eager equality.

[109]  arXiv:2405.01734 [pdf, other]
Title: Diabetic Retinopathy Detection Using Quantum Transfer Learning
Comments: 14 pages, 12 figures and 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Diabetic Retinopathy (DR), a prevalent complication in diabetes patients, can lead to vision impairment due to lesions formed on the retina. Detecting DR at an advanced stage often results in irreversible blindness. The traditional process of diagnosing DR through retina fundus images by ophthalmologists is not only time-intensive but also expensive. While classical transfer learning models have been widely adopted for computer-aided detection of DR, their high maintenance costs can hinder their detection efficiency. In contrast, Quantum Transfer Learning offers a more effective solution to this challenge. This approach is notably advantageous because it operates on heuristic principles, making it highly optimized for the task. Our proposed methodology leverages this hybrid quantum transfer learning technique to detect DR. To construct our model, we utilize the APTOS 2019 Blindness Detection dataset, available on Kaggle. We employ the ResNet-18, ResNet34, ResNet50, ResNet101, ResNet152 and Inception V3, pre-trained classical neural networks, for the initial feature extraction. For the classification stage, we use a Variational Quantum Classifier. Our hybrid quantum model has shown remarkable results, achieving an accuracy of 97% for ResNet-18. This demonstrates that quantum computing, when integrated with quantum machine learning, can perform tasks with a level of power and efficiency unattainable by classical computers alone. By harnessing these advanced technologies, we can significantly improve the detection and diagnosis of Diabetic Retinopathy, potentially saving many from the risk of blindness.
Keywords: Diabetic Retinopathy, Quantum Transfer Learning, Deep Learning

[110]  arXiv:2405.01735 [pdf, ps, other]
Title: On Smale's 17th problem over the reals
Comments: 44 pages
Subjects: Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA); Optimization and Control (math.OC); Probability (math.PR)

We consider the problem of efficiently solving a system of $n$ non-linear equations in ${\mathbb R}^d$. Addressing Smale's 17th problem stated in 1998, we consider a setting whereby the $n$ equations are random homogeneous polynomials of arbitrary degrees. In the complex case and for $n= d-1$, Beltr\'{a}n and Pardo proved the existence of an efficient randomized algorithm and Lairez recently showed it can be de-randomized to produce a deterministic efficient algorithm. Here we consider the real setting, to which previously developed methods do not apply. We describe an algorithm that efficiently finds solutions (with high probability) for $n= d -O(\sqrt{d\log d})$. If the maximal degree is very large, we also give an algorithm that works up to $n=d-1$.

[111]  arXiv:2405.01736 [pdf, other]
Title: PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects
Subjects: Hardware Architecture (cs.AR)

Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimization by previous works.
However, the design space of inter-operator pipelining is huge, and the space is not yet fully explored. In particular, identifying the right depth and granularity of pipelining (or no pipelining at all) is significantly dependent on the layer shapes and data volumes of weights and activations, and these are different even within a domain.
Moreover, works divide the substrate into large chunks and map one layer onto each chunk, which requires communicating halfway through or through the global buffer. However, for fine-grained inter-operation pipelining, placing the corresponding consumer of the next layer tile close to the producer tile of the current layer is a better way to exploit fine-grained spatial reuse.
In order to support variable number of layers (ie the right depth) and support multiple spatial organizations of layers (in accordance with the pipelining granularity) on the substrate, we propose PipeOrgan, a new class of spatial data organization strategy for energy efficient and congestion-free communication between the PEs for various pipeline depth and granularity. PipeOrgan takes advantage of flexible spatial organization and can allocate layers to PEs based on the granularity of pipelining. We also propose changes to the conventional mesh topology to improve the performance of coarse-grained allocation. PipeOrgan achieves 1.95x performance improvement over the state-of-the-art pipelined dataflow on XR-bench workloads.

[112]  arXiv:2405.01738 [pdf, other]
Title: Question Suggestion for Conversational Shopping Assistants Using Product Metadata
Comments: 5 pages, 1 figure
Subjects: Computation and Language (cs.CL)

Digital assistants have become ubiquitous in e-commerce applications, following the recent advancements in Information Retrieval (IR), Natural Language Processing (NLP) and Generative Artificial Intelligence (AI). However, customers are often unsure or unaware of how to effectively converse with these assistants to meet their shopping needs. In this work, we emphasize the importance of providing customers a fast, easy to use, and natural way to interact with conversational shopping assistants. We propose a framework that employs Large Language Models (LLMs) to automatically generate contextual, useful, answerable, fluent and diverse questions about products, via in-context learning and supervised fine-tuning. Recommending these questions to customers as helpful suggestions or hints to both start and continue a conversation can result in a smoother and faster shopping experience with reduced conversation overhead and friction. We perform extensive offline evaluations, and discuss in detail about potential customer impact, and the type, length and latency of our generated product questions if incorporated into a real-world shopping assistant.

[113]  arXiv:2405.01739 [pdf, other]
Title: Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers
Comments: Initial Submission
Subjects: Machine Learning (cs.LG)

On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices. To address this, developers often face a trade-off between model accuracy and power consumption, employing either computationally intensive models on high-power cores or pared-down models on low-power cores. Both approaches typically lead to a compromise in user experience (UX). This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power and maximizing cost-efficiency, especially for always-on use cases. GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs, which reduces power needs without compromising accuracy, and enables more efficient execution on heterogeneous compute cores. These improvements enhance UX through prolonged battery life, improved device responsiveness, and greater user comfort. In this work, we have integrated GC layers into vision and speech domain models including the transformer-based ViT model. Our experiments demonstrate theoretical power efficiency gains ranging from 158x to 30,000x for always-on scenarios. This substantial improvement empowers ODML applications with enhanced UX benefits.

[114]  arXiv:2405.01740 [pdf, ps, other]
Title: The Psychosocial Impacts of Generative AI Harms
Comments: Presented in Impact of GenAI on Social and Individual Well-being at AAAI 2024 Spring Symposium Series (2024)
Subjects: Computation and Language (cs.CL)

The rapid emergence of generative Language Models (LMs) has led to growing concern about the impacts that their unexamined adoption may have on the social well-being of diverse user groups. Meanwhile, LMs are increasingly being adopted in K-20 schools and one-on-one student settings with minimal investigation of potential harms associated with their deployment. Motivated in part by real-world/everyday use cases (e.g., an AI writing assistant) this paper explores the potential psychosocial harms of stories generated by five leading LMs in response to open-ended prompting. We extend findings of stereotyping harms analyzing a total of 150K 100-word stories related to student classroom interactions. Examining patterns in LM-generated character demographics and representational harms (i.e., erasure, subordination, and stereotyping) we highlight particularly egregious vignettes, illustrating the ways LM-generated outputs may influence the experiences of users with marginalized and minoritized identities, and emphasizing the need for a critical understanding of the psychosocial impacts of generative AI tools when deployed and utilized in diverse social contexts.

[115]  arXiv:2405.01741 [pdf, other]
Title: PVF (Parameter Vulnerability Factor): A Quantitative Metric Measuring AI Vulnerability and Resilience Against Parameter Corruptions
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG)

Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them inevitably and increasingly susceptible to hardware faults (e.g., bit flips) that can potentially corrupt model parameters. Given this challenge, this paper aims to answer a critical question: How likely is a parameter corruption to result in an incorrect model output? To systematically answer this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model resilience/vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. Similar to AVF, this statistical concept can be derived from statistically extensive and meaningful fault injection (FI) experiments. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT). PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice.

[116]  arXiv:2405.01742 [pdf, other]
Title: Addressing Privacy Concerns in Joint Communication and Sensing for 6G Networks: Challenges and Prospects
Subjects: Networking and Internet Architecture (cs.NI)

The vision for 6G extends beyond mere communication, incorporating sensing capabilities to facilitate a diverse array of novel applications and services. However, the advent of joint communication and sensing (JCAS) technology introduces concerns regarding the handling of sensitive personally identifiable information (PII) pertaining to individuals and objects, along with external third-party data and disclosure. Consequently, JCAS-based applications are susceptible to privacy breaches, including location tracking, identity disclosure, profiling, and misuse of sensor data, raising significant implications under the European Union's General Data Protection Regulation (GDPR) as well as other applicable standards. This paper critically examines emergent JCAS architectures and underscores the necessity for network functions to enable privacy-specific features in the 6G systems. We propose an enhanced JCAS architecture with additional network functions and interfaces, facilitating the management of sensing policies, consent information, and transparency guidelines, alongside the integration of sensing-specific functions and storage for sensing processing sessions. Furthermore, we conduct a comprehensive threat analysis for all interfaces, employing security threat model STRIDE and privacy threat model LINDDUN. We also summarise the identified threats using standard Common Weakness Enumerations (CWEs). Finally, we suggest the security and privacy controls as the mitigating strategies to counter the identified threats stemming from the JCAS architecture.

[117]  arXiv:2405.01744 [pdf, other]
Title: ALCM: Autonomous LLM-Augmented Causal Discovery Framework
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Methodology (stat.ME)

To perform effective causal inference in high-dimensional datasets, initiating the process with causal discovery is imperative, wherein a causal graph is generated based on observational data. However, obtaining a complete and accurate causal graph poses a formidable challenge, recognized as an NP-hard problem. Recently, the advent of Large Language Models (LLMs) has ushered in a new era, indicating their emergent capabilities and widespread applicability in facilitating causal reasoning across diverse domains, such as medicine, finance, and science. The expansive knowledge base of LLMs holds the potential to elevate the field of causal reasoning by offering interpretability, making inferences, generalizability, and uncovering novel causal structures. In this paper, we introduce a new framework, named Autonomous LLM-Augmented Causal Discovery Framework (ALCM), to synergize data-driven causal discovery algorithms and LLMs, automating the generation of a more resilient, accurate, and explicable causal graph. The ALCM consists of three integral components: causal structure learning, causal wrapper, and LLM-driven causal refiner. These components autonomously collaborate within a dynamic environment to address causal discovery questions and deliver plausible causal graphs. We evaluate the ALCM framework by implementing two demonstrations on seven well-known datasets. Experimental results demonstrate that ALCM outperforms existing LLM methods and conventional data-driven causal reasoning mechanisms. This study not only shows the effectiveness of the ALCM but also underscores new research directions in leveraging the causal reasoning capabilities of LLMs.

[118]  arXiv:2405.01745 [pdf, other]
Title: Large Language Models for UAVs: Current State and Pathways to the Future
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Unmanned Aerial Vehicles (UAVs) have emerged as a transformative technology across diverse sectors, offering adaptable solutions to complex challenges in both military and civilian domains. Their expanding capabilities present a platform for further advancement by integrating cutting-edge computational tools like Artificial Intelligence (AI) and Machine Learning (ML) algorithms. These advancements have significantly impacted various facets of human life, fostering an era of unparalleled efficiency and convenience. Large Language Models (LLMs), a key component of AI, exhibit remarkable learning and adaptation capabilities within deployed environments, demonstrating an evolving form of intelligence with the potential to approach human-level proficiency. This work explores the significant potential of integrating UAVs and LLMs to propel the development of autonomous systems. We comprehensively review LLM architectures, evaluating their suitability for UAV integration. Additionally, we summarize the state-of-the-art LLM-based UAV architectures and identify novel opportunities for LLM embedding within UAV frameworks. Notably, we focus on leveraging LLMs to refine data analysis and decision-making processes, specifically for enhanced spectral sensing and sharing in UAV applications. Furthermore, we investigate how LLM integration expands the scope of existing UAV applications, enabling autonomous data processing, improved decision-making, and faster response times in emergency scenarios like disaster response and network restoration. Finally, we highlight crucial areas for future research that are critical for facilitating the effective integration of LLMs and UAVs.

[119]  arXiv:2405.01753 [pdf, other]
Title: A Feedback Linearized Model Predictive Control Strategy for Input-Constrained Self-Driving Cars
Comments: Preprint of a manuscript currently under review for TCTS
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper proposes a novel real-time affordable solution to the trajectory tracking control problem for self-driving cars subject to longitudinal and steering angular velocity constraints. To this end, we develop a dual-mode Model Predictive Control (MPC) solution starting from an input-output feedback linearized description of the vehicle kinematics. First, we derive the state-dependent input constraints acting on the linearized model and characterize their worst-case time-invariant inner approximation. Then, a dual-mode MPC is derived to be real-time affordable and ensuring, by design, constraints fulfillment, recursive feasibility, and uniformly ultimate boundedness of the tracking error in an ad-hoc built robust control invariant region. The approach's effectiveness and performance are experimentally validated via laboratory experiments on a Quanser Qcar. The obtained results show that the proposed solution is computationally affordable and with tracking capabilities that outperform two alternative control schemes.

[120]  arXiv:2405.01754 [pdf, other]
Title: A Peer-to-Peer Energy Management Solution for Maximum Social Welfare
Subjects: Computational Engineering, Finance, and Science (cs.CE)

In smart energy communities, prosumers who both generate and consume energy play a crucial role in shaping energy management strategies. These communities use advanced platforms that enable prosumers to actively engage in the local electricity markets by setting and adjusting their own energy prices. Through peer to peer (P2P) energy trading systems, members can directly exchange energy derived from sources such as solar photovoltaic panels, electric vehicle battery storage, and demand response (DR) programs. This direct exchange not only enhances the efficiency of the network but also fosters a dynamic energy market within the community. In this article, parking-sharing services for EVs and the mechanisms of P2P energy scheduling, which facilitates the transfer and communication of power among different energy communities (ECs) are addressed. It focuses on integrating solar power, responsive electrical loads, and electric vehicles (EVs) to optimize both economic returns and social benefits for all participants. The system is designed to ensure that all energy transactions are transparent and beneficial to the proactive consumers involved. Moreover, due to urban traffic conditions and the challenges of finding suitable locations for EV charging and parking, houses in these communities provide parking-sharing services for EVs. This integration of energy management and urban scheduling illustrates a holistic approach to addressing both energy and transportation challenges, ultimately leading to more sustainable urban environments.

[121]  arXiv:2405.01757 [pdf, other]
Title: Towards A Double-Edged Sword: Modelling the Impact in Agile Software Development
Subjects: Software Engineering (cs.SE)

Agile methods are state of the art in software development. Companies worldwide apply agile to counter the dynamics of the markets. We know, that various factors like culture influence the successfully application of agile methods in practice and the sucess is differing from company to company. To counter these problems, we combine two causal models presented in literature: The Agile Practices Impact Model and the Model of Cultural Impact. In this paper, we want to better understand the two facets of factors in agile: Those influencing their application and those impacting the results when applying them. This papers core contribution is the Agile Influence and Imact Model, describing the factors influencing agile elements and the impact on specific characteristics in a systematic manner.

[122]  arXiv:2405.01758 [pdf, other]
Title: CGD: Constraint-Guided Diffusion Policies for UAV Trajectory Planning
Comments: 8 pages, 3 figures
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly generating trajectories similar to those from the expert, (1) their output does not explicitly account for dynamic feasibility, and (2) the policies do not accommodate changes in the constraints different from those used during training.
To overcome these limitations, we propose Constraint-Guided Diffusion (CGD), a novel IL-based approach to trajectory planning. CGD leverages a hybrid learning/online optimization scheme that combines diffusion policies with a surrogate efficient optimization problem, enabling the generation of collision-free, dynamically feasible trajectories. The key ideas of CGD include dividing the original challenging optimization problem solved by the expert into two more manageable sub-problems: (a) efficiently finding collision-free paths, and (b) determining a dynamically-feasible time-parametrization for those paths to obtain a trajectory. Compared to conventional neural network architectures, we demonstrate through numerical evaluations significant improvements in performance and dynamic feasibility under scenarios with new constraints never encountered during training.

[123]  arXiv:2405.01760 [pdf, other]
Title: Reinforcement Learning-Guided Semi-Supervised Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In recent years, semi-supervised learning (SSL) has gained significant attention due to its ability to leverage both labeled and unlabeled data to improve model performance, especially when labeled data is scarce. However, most current SSL methods rely on heuristics or predefined rules for generating pseudo-labels and leveraging unlabeled data. They are limited to exploiting loss functions and regularization methods within the standard norm. In this paper, we propose a novel Reinforcement Learning (RL) Guided SSL method, RLGSSL, that formulates SSL as a one-armed bandit problem and deploys an innovative RL loss based on weighted reward to adaptively guide the learning process of the prediction model. RLGSSL incorporates a carefully designed reward function that balances the use of labeled and unlabeled data to enhance generalization performance. A semi-supervised teacher-student framework is further deployed to increase the learning stability. We demonstrate the effectiveness of RLGSSL through extensive experiments on several benchmark datasets and show that our approach achieves consistent superior performance compared to state-of-the-art SSL methods.

[124]  arXiv:2405.01762 [pdf, ps, other]
Title: EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time
Comments: 19 pages
Journal-ref: ICML 2024
Subjects: Machine Learning (cs.LG)

Understanding and explaining the predictions of Graph Neural Networks (GNNs), is crucial for enhancing their safety and trustworthiness. Subgraph-level explanations are gaining attention for their intuitive appeal. However, most existing subgraph-level explainers face efficiency challenges in explaining GNNs due to complex search processes. The key challenge is to find a balance between intuitiveness and efficiency while ensuring transparency. Additionally, these explainers usually induce subgraphs by nodes, which may introduce less-intuitive disconnected nodes in the subgraph-level explanations or omit many important subgraph structures. In this paper, we reveal that inducing subgraph explanations by edges is more comprehensive than other subgraph inducing techniques. We also emphasize the need of determining the subgraph explanation size for each data instance, as different data instances may involve different important substructures. Building upon these considerations, we introduce a training-free approach, named EiG-Search. We employ an efficient linear-time search algorithm over the edge-induced subgraphs, where the edges are ranked by an enhanced gradient-based importance. We conduct extensive experiments on a total of seven datasets, demonstrating its superior performance and efficiency both quantitatively and qualitatively over the leading baselines.

[125]  arXiv:2405.01765 [pdf, other]
Title: Early years of Biased Random-Key Genetic Algorithms: A systematic review
Comments: 24 pages, 9 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

This paper presents a systematic literature review and bibliometric analysis focusing on Biased Random-Key Genetic Algorithms (BRKGA). BRKGA is a metaheuristic framework that uses random-key-based chromosomes with biased, uniform, and elitist mating strategies alongside a genetic algorithm. This review encompasses around~250 papers, covering a diverse array of applications ranging from classical combinatorial optimization problems to real-world industrial scenarios, and even non-traditional applications like hyperparameter tuning in machine learning and scenario generation for two-stage problems. In summary, this study offers a comprehensive examination of the BRKGA metaheuristic and its various applications, shedding light on key areas for future research.

[126]  arXiv:2405.01768 [pdf, other]
Title: CoS: Enhancing Personalization and Mitigating Bias with Context Steering
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

When querying a large language model (LLM), the context, i.e. personal, demographic, and cultural information specific to an end-user, can significantly shape the response of the LLM. For example, asking the model to explain Newton's second law with the context "I am a toddler" yields a different answer compared to the context "I am a physics professor." Proper usage of the context enables the LLM to generate personalized responses, whereas inappropriate contextual influence can lead to stereotypical and potentially harmful generations (e.g. associating "female" with "housekeeper"). In practice, striking the right balance when leveraging context is a nuanced and challenging problem that is often situation-dependent. One common approach to address this challenge is to fine-tune LLMs on contextually appropriate responses. However, this approach is expensive, time-consuming, and not controllable for end-users in different situations. In this work, we propose Context Steering (CoS) - a simple training-free method that can be easily applied to autoregressive LLMs at inference time. By measuring the contextual influence in terms of token prediction likelihood and modulating it, our method enables practitioners to determine the appropriate level of contextual influence based on their specific use case and end-user base. We showcase a variety of applications of CoS including amplifying the contextual influence to achieve better personalization and mitigating unwanted influence for reducing model bias. In addition, we show that we can combine CoS with Bayesian Inference to quantify the extent of hate speech on the internet. We demonstrate the effectiveness of CoS on state-of-the-art LLMs and benchmarks.

[127]  arXiv:2405.01769 [pdf, other]
Title: A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law
Comments: 35 pages, 6 figures
Subjects: Computation and Language (cs.CL)

In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications, challenges, and forward-looking opportunities of LLMs within these high-stakes sectors. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. Moreover, we critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems that respect regulatory norms. By presenting a thorough review of current literature and practical applications, we showcase the transformative impact of LLMs, and outline the imperative for interdisciplinary cooperation, methodological advancements, and ethical vigilance. Through this lens, we aim to spark dialogue and inspire future research dedicated to maximizing the benefits of LLMs while mitigating their risks in these precision-dependent sectors. To facilitate future research on LLMs in these critical societal domains, we also initiate a reading list that tracks the latest advancements under this topic, which will be continually updated: \url{https://github.com/czyssrs/LLM_X_papers}.

[128]  arXiv:2405.01771 [pdf, other]
Title: Towards Predicting Collective Performance in Multi-Robot Teams
Subjects: Robotics (cs.RO)

The increased deployment of multi-robot systems (MRS) in various fields has led to the need for analysis of system-level performance. However, creating consistent metrics for MRS is challenging due to the wide range of system and environmental factors, such as team size and environment size. This paper presents a new analytical framework for MRS based on dimensionless variable analysis, a mathematical technique typically used to simplify complex physical systems. This approach effectively condenses the complex parameters influencing MRS performance into a manageable set of dimensionless variables. We form dimensionless variables which encapsulate key parameters of the robot team and task. Then we use these dimensionless variables to fit a parametric model of team performance. Our model successfully identifies critical performance determinants and their interdependencies, providing insight for MRS design and optimization. The application of dimensionless variable analysis to MRS offers a promising method for MRS analysis that effectively reduces complexity, enhances comprehension of system behaviors, and informs the design and management of future MRS deployments.

[129]  arXiv:2405.01772 [pdf, other]
Title: Unconstraining Multi-Robot Manipulation: Enabling Arbitrary Constraints in ECBS with Bounded Sub-Optimality
Comments: The first two authors contributed equally. Accepted to SoCS 2024
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)

Multi-Robot-Arm Motion Planning (M-RAMP) is a challenging problem featuring complex single-agent planning and multi-agent coordination. Recent advancements in extending the popular Conflict-Based Search (CBS) algorithm have made large strides in solving Multi-Agent Path Finding (MAPF) problems. However, fundamental challenges remain in applying CBS to M-RAMP. A core challenge is the existing reliance of the CBS framework on conservative "complete" constraints. These constraints ensure solution guarantees but often result in slow pruning of the search space -- causing repeated expensive single-agent planning calls. Therefore, even though it is possible to leverage domain knowledge and design incomplete M-RAMP-specific CBS constraints to more efficiently prune the search, using these constraints would render the algorithm itself incomplete. This forces practitioners to choose between efficiency and completeness.
In light of these challenges, we propose a novel algorithm, Generalized ECBS, aimed at removing the burden of choice between completeness and efficiency in MAPF algorithms. Our approach enables the use of arbitrary constraints in conflict-based algorithms while preserving completeness and bounding sub-optimality. This enables practitioners to capitalize on the benefits of arbitrary constraints and opens a new space for constraint design in MAPF that has not been explored. We provide a theoretical analysis of our algorithms, propose new "incomplete" constraints, and demonstrate their effectiveness through experiments in M-RAMP.

[130]  arXiv:2405.01774 [pdf, other]
Title: One-Shot Wyner-Ziv Compression of a Uniform Source
Subjects: Information Theory (cs.IT)

In this paper, we consider the one-shot version of the classical Wyner-Ziv problem where a source is compressed in a lossy fashion when only the decoder has access to a correlated side information. Following the entropy-constrained quantization framework, we assume a scalar quantizer followed by variable length entropy coding. We consider compression of a uniform source, motivated by its role in the compression of processes with low-dimensional features embedded within a high-dimensional ambient space. We find upper and lower bounds to the entropy-distortion functions of the uniform source for quantized and noisy side information, and illustrate tightness of the bounds at high compression rates.

[131]  arXiv:2405.01775 [pdf, other]
Title: Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Comments: Accepted for publication at MLSys 2024
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

The development of model compression is continuously motivated by the evolution of various neural network accelerators with ASIC or FPGA. On the algorithm side, the ultimate goal of quantization or pruning is accelerating the expensive DNN computations on low-power hardware. However, such a "design-and-deploy" workflow faces under-explored challenges in the current hardware-algorithm co-design community. First, although the state-of-the-art quantization algorithm can achieve low precision with negligible degradation of accuracy, the latest deep learning framework (e.g., PyTorch) can only support non-customizable 8-bit precision, data format, and parameter extraction. Secondly, the objective of quantization is to enable the computation with low-precision data. However, the current SoTA algorithm treats the quantized integer as an intermediate result, while the final output of the quantizer is the "discretized" floating-point values, ignoring the practical needs and adding additional workload to hardware designers for integer parameter extraction and layer fusion. Finally, the compression toolkits designed by the industry are constrained to their in-house product or a handful of algorithms. The limited degree of freedom in the current toolkit and the under-explored customization hinder the prototype ASIC or FPGA-based accelerator design. To resolve these challenges, we propose Torch2Chip, an open-sourced, fully customizable, and high-performance toolkit that supports user-designed compression followed by automatic model fusion and parameter extraction. Torch2Chip incorporates the hierarchical design workflow, and the user-customized compression algorithm will be directly packed into the deployment-ready format for prototype chip verification with either CNN or vision transformer (ViT). The code is available at https://github.com/SeoLabCornell/torch2chip.

[132]  arXiv:2405.01776 [pdf, other]
Title: An Approach to Systematic Data Acquisition and Data-Driven Simulation for the Safety Testing of Automated Driving Functions
Comments: 8 pages, 5 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

With growing complexity and criticality of automated driving functions in road traffic and their operational design domains (ODD), there is increasing demand for covering significant proportions of development, validation, and verification in virtual environments and through simulation models.
If, however, simulations are meant not only to augment real-world experiments, but to replace them, quantitative approaches are required that measure to what degree and under which preconditions simulation models adequately represent reality, and thus, using their results accordingly. Especially in R&D areas related to the safety impact of the "open world", there is a significant shortage of real-world data to parameterize and/or validate simulations - especially with respect to the behavior of human traffic participants, whom automated driving functions will meet in mixed traffic.
We present an approach to systematically acquire data in public traffic by heterogeneous means, transform it into a unified representation, and use it to automatically parameterize traffic behavior models for use in data-driven virtual validation of automated driving functions.

[133]  arXiv:2405.01778 [pdf, other]
Title: Hierarchical mixture of discriminative Generalized Dirichlet classifiers
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper presents a discriminative classifier for compositional data. This classifier is based on the posterior distribution of the Generalized Dirichlet which is the discriminative counterpart of Generalized Dirichlet mixture model. Moreover, following the mixture of experts paradigm, we proposed a hierarchical mixture of this classifier. In order to learn the models parameters, we use a variational approximation by deriving an upper-bound for the Generalized Dirichlet mixture. To the best of our knownledge, this is the first time this bound is proposed in the literature. Experimental results are presented for spam detection and color space identification.

[134]  arXiv:2405.01783 [pdf, ps, other]
Title: Layers of technology in pluriversal design. Decolonising language technology with the LiveLanguage initiative
Subjects: Computation and Language (cs.CL)

Language technology has the potential to facilitate intercultural communication through meaningful translations. However, the current state of language technology is deeply entangled with colonial knowledge due to path dependencies and neo-colonial tendencies in the global governance of artificial intelligence (AI). Language technology is a complex and emerging field that presents challenges for co-design interventions due to enfolding in assemblages of global scale and diverse sites and its knowledge intensity. This paper uses LiveLanguage, a lexical database, a set of services with particular emphasis on modelling language diversity and integrating small and minority languages, as an example to discuss and close the gap from pluriversal design theory to practice. By diversifying the concept of emerging technology, we can better approach language technology in global contexts. The paper presents a model comprising of five layers of technological activity. Each layer consists of specific practices and stakeholders, thus provides distinctive spaces for co-design interventions as mode of inquiry for de-linking, re-thinking and re-building language technology towards pluriversality. In that way, the paper contributes to reflecting the position of co-design in decolonising emergent technologies, and to integrating complex theoretical knowledge towards decoloniality into language technology design.

[135]  arXiv:2405.01785 [pdf, other]
Title: Towards Green Communication: Soft Decoding Scheme for OOK Signals in Zero-Energy Devices
Comments: Accepted in IEEE International Communications Conference (ICC) workshop, Denver, Jun 2024
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The booming of Internet-of-Things (IoT) is expected to provide more intelligent and reliable communication services for higher network coverage, massive connectivity, and low-cost solutions for 6G services. However, frequent charging and battery replacement of these massive IoT devices brings a series of challenges. Zero energy devices, which rely on energy-harvesting technologies and can operate without battery replacement or charging, play a pivotal role in facilitating the massive use of IoT devices. In order to enable reliable communications of such low-power devices, Manchester-coded on-off keying (OOK) modulation and non-coherent detections are attractive techniques due to their energy efficiency, robustness in noisy environments, and simplicity in receiver design. Moreover, to extend their communication range, employing channel coding along with enhanced detection schemes is crucial. In this paper, a novel soft-decision decoder is designed for OOK-based low-power receivers to enhance their detection performance. In addition, exact closed-form expressions and two simplified approximations are derived for the log-likelihood ratio (LLR), an essential metric for soft decoding. Numerical results demonstrate the significant coverage gain achieved through soft decoding for convolutional code.

[136]  arXiv:2405.01787 [pdf, other]
Title: Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*.
Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux, to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem -- producing a definition given a formal specification expressed as an F* type. We provide a program-fragment checker that queries F* to check the correctness of candidate solutions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker.
Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F*, with promising results. Our main finding in that the performance of fine-tuned smaller language models (such as Phi-2 or StarCoder) compare favorably with large language models (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements.

[137]  arXiv:2405.01790 [pdf, other]
Title: Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization
Comments: Accepted at VarDial 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles. However, summarization models are increasingly being used to summarize diverse sources of text, such as social media data, that encompass a wide demographic user base. It is thus crucial to assess not only the quality of the generated summaries, but also the extent to which they can fairly represent the opinions of diverse social groups. Position bias, a long-known issue in news summarization, has received limited attention in the context of social multi-document summarization. We deeply investigate this phenomenon by analyzing the effect of group ordering in input documents when summarizing tweets from three distinct linguistic communities: African-American English, Hispanic-aligned Language, and White-aligned Language. Our empirical analysis shows that although the textual quality of the summaries remains consistent regardless of the input document order, in terms of fairness, the results vary significantly depending on how the dialect groups are presented in the input data. Our results suggest that position bias manifests differently in social multi-document summarization, severely impacting the fairness of summarization models.

[138]  arXiv:2405.01792 [pdf, other]
Title: Learning Robust Autonomous Navigation and Locomotion for Wheeled-Legged Robots
Journal-ref: Science Robotics, 2024, Vol 9, Issue 89
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Autonomous wheeled-legged robots have the potential to transform logistics systems, improving operational efficiency and adaptability in urban environments. Navigating urban environments, however, poses unique challenges for robots, necessitating innovative solutions for locomotion and navigation. These challenges include the need for adaptive locomotion across varied terrains and the ability to navigate efficiently around complex dynamic obstacles. This work introduces a fully integrated system comprising adaptive locomotion control, mobility-aware local navigation planning, and large-scale path planning within the city. Using model-free reinforcement learning (RL) techniques and privileged learning, we develop a versatile locomotion controller. This controller achieves efficient and robust locomotion over various rough terrains, facilitated by smooth transitions between walking and driving modes. It is tightly integrated with a learned navigation controller through a hierarchical RL framework, enabling effective navigation through challenging terrain and various obstacles at high speed. Our controllers are integrated into a large-scale urban navigation system and validated by autonomous, kilometer-scale navigation missions conducted in Zurich, Switzerland, and Seville, Spain. These missions demonstrate the system's robustness and adaptability, underscoring the importance of integrated control systems in achieving seamless navigation in complex environments. Our findings support the feasibility of wheeled-legged robots and hierarchical RL for autonomous navigation, with implications for last-mile delivery and beyond.

[139]  arXiv:2405.01793 [pdf, other]
Title: Formalizing Pick's Theorem in Isabelle/HOL
Subjects: Logic in Computer Science (cs.LO)

We formalize Pick's theorem for finding the area of a simple polygon whose vertices are integral lattice points. We are inspired by John Harrison's formalization of Pick's theorem in HOL Light, but tailor our proof approach to avoid a primary challenge point in his formalization, which is proving that any polygon with more than three vertices can be split (in its interior) by a line between some two vertices. We detail the approach we use to avoid this step and reflect on the pros and cons of our eventual formalization strategy. We use the theorem prover Isabelle/HOL, and our formalization involves augmenting the existing geometry libraries in various foundational ways (e.g., by adding the definition of a polygon and formalizing some key properties thereof).

[140]  arXiv:2405.01794 [pdf, ps, other]
Title: New design of smooth PSO-IPF navigator with kinematic constraints
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Robotic applications across industries demand advanced navigation for safe and smooth movement. Smooth path planning is crucial for mobile robots to ensure stable and efficient navigation, as it minimizes jerky movements and enhances overall performance Achieving this requires smooth collision-free paths. Partial Swarm Optimization (PSO) and Potential Field (PF) are notable path-planning techniques, however, they may struggle to produce smooth paths due to their inherent algorithms, potentially leading to suboptimal robot motion and increased energy consumption. In addition, while PSO efficiently explores solution spaces, it generates long paths and has limited global search. On the contrary, PF methods offer concise paths but struggle with distant targets or obstacles. To address this, we propose Smoothed Partial Swarm Optimization with Improved Potential Field (SPSO-IPF), combining both approaches and it is capable of generating a smooth and safe path. Our research demonstrates SPSO-IPF's superiority, proving its effectiveness in static and dynamic environments compared to a mere PSO or a mere PF approach.

[141]  arXiv:2405.01795 [pdf, ps, other]
Title: The Role of Human Factors in the LastPass Breach
Authors: Niroop Sugunaraj
Subjects: Human-Computer Interaction (cs.HC); Cryptography and Security (cs.CR)

This paper examines the complex nature of cyber attacks through an analysis of the LastPass breach. It argues for the integration of human-centric considerations into cybersecurity measures, focusing on mitigating factors such as goal-directed behavior, cognitive overload, human biases (e.g., optimism, anchoring), and risky behaviors. Findings from an analysis of this breach offers support to the perspective that addressing both the human and technical dimensions of cyber defense can significantly enhance the resilience of cyber systems against complex threats. This means maintaining a balanced approach while simultaneously simplifying user interactions, making users aware of biases, and discouraging risky practices are essential for preventing cyber incidents.

[142]  arXiv:2405.01796 [pdf, other]
Title: TOPICAL: TOPIC Pages AutomagicaLly
Comments: 10 pages, 7 figures, 2 tables, NAACL System Demonstrations 2024
Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Information Retrieval (cs.IR)

Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to traditional web search. While most prior work has focused on generating topic pages about biographical entities, in this work, we develop a completely automated process to generate high-quality topic pages for scientific entities, with a focus on biomedical concepts. We release TOPICAL, a web app and associated open-source code, comprising a model pipeline combining retrieval, clustering, and prompting, that makes it easy for anyone to generate topic pages for a wide variety of biomedical entities on demand. In a human evaluation of 150 diverse topic pages generated using TOPICAL, we find that the vast majority were considered relevant, accurate, and coherent, with correct supporting citations. We make all code publicly available and host a free-to-use web app at: https://s2-topical.apps.allenai.org

[143]  arXiv:2405.01797 [pdf, other]
Title: Learning under Imitative Strategic Behavior with Unforeseeable Outcomes
Subjects: Artificial Intelligence (cs.AI)

Machine learning systems have been widely used to make decisions about individuals who may best respond and behave strategically to receive favorable outcomes, e.g., they may genuinely improve the true labels or manipulate observable features directly to game the system without changing labels. Although both behaviors have been studied (often as two separate problems) in the literature, most works assume individuals can (i) perfectly foresee the outcomes of their behaviors when they best respond; (ii) change their features arbitrarily as long as it is affordable, and the costs they need to pay are deterministic functions of feature changes. In this paper, we consider a different setting and focus on imitative strategic behaviors with unforeseeable outcomes, i.e., individuals manipulate/improve by imitating the features of those with positive labels, but the induced feature changes are unforeseeable. We first propose a Stackelberg game to model the interplay between individuals and the decision-maker, under which we examine how the decision-maker's ability to anticipate individual behavior affects its objective function and the individual's best response. We show that the objective difference between the two can be decomposed into three interpretable terms, with each representing the decision-maker's preference for a certain behavior. By exploring the roles of each term, we further illustrate how a decision-maker with adjusted preferences can simultaneously disincentivize manipulation, incentivize improvement, and promote fairness.

[144]  arXiv:2405.01798 [pdf, other]
Title: The Economy and Public Diplomacy: An Analysis of RT's Economic Content and Context on Facebook
Comments: 14 pages, 6 figures
Subjects: Information Theory (cs.IT); General Economics (econ.GN)

With globalization's rise, economic interdependence's impacts have become a prominent factor affecting personal lives, as well as national and international dynamics. This study examines RT's public diplomacy efforts on its non-Russian Facebook accounts over the past five years to identify the prominence of economic topics across language accounts. Computational analysis, including word embeddings and statistical methods, investigates how offline economic indicators, like currency values and oil prices, correspond to RT's online economic content changes. The results demonstrate that RT uses message reinforcement associated economic topics as an audience targeting strategy and differentiates their use with changing currency and oil values.

[145]  arXiv:2405.01799 [pdf, other]
Title: Exploiting ChatGPT for Diagnosing Autism-Associated Language Disorders and Identifying Distinct Features
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Diagnosing language disorders associated with autism is a complex and nuanced challenge, often hindered by the subjective nature and variability of traditional assessment methods. Traditional diagnostic methods not only require intensive human effort but also often result in delayed interventions due to their lack of speed and specificity. In this study, we explored the application of ChatGPT, a state of the art large language model, to overcome these obstacles by enhancing diagnostic accuracy and profiling specific linguistic features indicative of autism. Leveraging ChatGPT advanced natural language processing capabilities, this research aims to streamline and refine the diagnostic process. Specifically, we compared ChatGPT's performance with that of conventional supervised learning models, including BERT, a model acclaimed for its effectiveness in various natural language processing tasks. We showed that ChatGPT substantially outperformed these models, achieving over 13% improvement in both accuracy and F1 score in a zero shot learning configuration. This marked enhancement highlights the model potential as a superior tool for neurological diagnostics. Additionally, we identified ten distinct features of autism associated language disorders that vary significantly across different experimental scenarios. These features, which included echolalia, pronoun reversal, and atypical language usage, were crucial for accurately diagnosing ASD and customizing treatment plans. Together, our findings advocate for adopting sophisticated AI tools like ChatGPT in clinical settings to assess and diagnose developmental disorders. Our approach not only promises greater diagnostic precision but also aligns with the goals of personalized medicine, potentially transforming the evaluation landscape for autism and similar neurological conditions.

[146]  arXiv:2405.01803 [pdf, other]
Title: How to Gain Commit Rights in Modern Top Open Source Communities?
Comments: 23 pages,5 figures,FSE 2024
Journal-ref: Proceedings of the ACM on Software Engineering (PACMSE) Issue FSE 2024
Subjects: Software Engineering (cs.SE)

The success of open source software (OSS) projects relies on voluntary contributions from various community roles.Being a committer signifies gaining trust and higher privileges. Substantial studies have focused on the requirements of becoming a committer, but most of them are based on interviews or several hypotheses, lacking a comprehensive understanding of committers' qualifications.We explore both the policies and practical implementations of committer qualifications in modern top OSS communities. Through a thematic analysis of these policies, we construct a taxonomy of committer qualifications, consisting of 26 codes categorized into nine themes, including Personnel-related to Project, Communication, and Long-term Participation. We also highlight the variations in committer qualifications emphasized in different OSS community governance models. For example, projects following the core maintainer model value project comprehension, while projects following the company-backed model place significant emphasis on user issue resolution. Then, we propose eight sets of metrics and perform survival analysis on two representative OSS projects to understand how these qualifications are implemented in practice. We find that the probability of gaining commit rights decreases as participation time passes.The selection criteria in practice are generally consistent with the community policies. Developers who submit high-quality code, actively engage in code review, and make extensive contributions to related projects are more likely to be granted commit rights. However, there are some qualifications that do not align precisely, and some are not adequately evaluated. This study contributes to the understanding of trust establishment in modern top OSS communities, assists communities in better allocating commit rights, and supports developers in achieving self-actualization through OSS participation.

[147]  arXiv:2405.01805 [pdf, ps, other]
Title: Crafting Tomorrow's Evaluations: Assessment Design Strategies in the Era of Generative AI
Comments: 5 pages, 2 figures
Subjects: Computers and Society (cs.CY)

GenAI has gained the attention of a myriad of users in almost every profession. Its advancement has had an intense impact on education, significantly disrupting the assessment design and evaluation methodologies. Despite the potential benefits and possibilities of GenAI in the education sector, there are several concerns primarily centred around academic integrity, authenticity, equity of access, assessment evaluation methodology, and feedback. Consequently, academia is encountering challenges in assessment design that are essential to retaining academic integrity in the age of GenAI. In this article, we discuss the challenges, and opportunities that need to be addressed for the assessment design and evaluation. The article also highlights the importance of clear policy about the usage of GenAI in completing assessment tasks, and also in design approaches to ensure academic integrity and subject learning. Additionally, this article also provides assessment categorisation based on the use of GenAI to cultivate knowledge among students and academic professionals. It also provides information on the skills necessary to formulate and articulate problems and evaluate the task, enabling students and academics to effectively utilise GenAI tools.

[148]  arXiv:2405.01807 [pdf, other]
Title: Algorithmic Decision-Making under Agents with Persistent Improvement
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

This paper studies algorithmic decision-making under human's strategic behavior, where a decision maker uses an algorithm to make decisions about human agents, and the latter with information about the algorithm may exert effort strategically and improve to receive favorable decisions. Unlike prior works that assume agents benefit from their efforts immediately, we consider realistic scenarios where the impacts of these efforts are persistent and agents benefit from efforts by making improvements gradually. We first develop a dynamic model to characterize persistent improvements and based on this construct a Stackelberg game to model the interplay between agents and the decision-maker. We analytically characterize the equilibrium strategies and identify conditions under which agents have incentives to improve. With the dynamics, we then study how the decision-maker can design an optimal policy to incentivize the largest improvements inside the agent population. We also extend the model to settings where 1) agents may be dishonest and game the algorithm into making favorable but erroneous decisions; 2) honest efforts are forgettable and not sufficient to guarantee persistent improvements. With the extended models, we further examine conditions under which agents prefer honest efforts over dishonest behavior and the impacts of forgettable efforts.

[149]  arXiv:2405.01808 [pdf, other]
Title: GRAND Massive Parallel Decoding Framework for Low Latency in Beyond 5G
Comments: Accepted at 15th International Conference on Ubiquitous and Future Networks (ICUFN 2024)
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

We propose a massive parallel decoding GRAND framework. The framework introduces two novelties: 1. A likelihood function for $M$-QAM demodulated signals that effectively reduces the symbol error pattern space from $\mathcal{O}(5^{N/\log_2 M})$ down to $\mathcal{O}(4^{N/\log_2 M})$; and 2. A massively parallel matrix-vector multiplication for matrices of size $K\times N$ ($K \leq N$) that performs the multiplication in just $\mathcal{O}(\log_2 N)$ steps. We then apply the proposed GRAND approach to codes and operational modulation techniques used in the current 5G NR standard. Our framework is applicable not just to short codewords but to the full range of codewords from 32 bits up to 1024 bits used in the control channels of 5G NR. We also present simulation results with parity-check matrices of Polar codes with rate $R=1/2$ obtained from the 5G NR universal reliability sequence.

[150]  arXiv:2405.01809 [pdf, ps, other]
Title: A Logic of Sattestation
Comments: 18 pages. Extended version (including proofs) of paper to appear in CSF'24
Subjects: Cryptography and Security (cs.CR); Logic in Computer Science (cs.LO)

We introduce a logic for reasoning about contextual trust for web addresses, provide a Kripke semantics for it, and prove its soundness under reasonable assumptions about principals' policies.
Self-Authenticating Traditional Addresses (SATAs) are valid DNS addresses or URLs that are generally meaningful -- to both humans and web infrastructure -- and contain a commitment to a public key in the address itself. Trust in web addresses is currently established via domain name registration, TLS certificates, and other hierarchical elements of the internet infrastructure. SATAs support such structural roots of trust but also complementary contextual roots associated with descriptive properties. The existing structural roots leave web connections open to a variety of well-documented and significant hijack vulnerabilities. Contextual trust roots provide, among other things, stronger resistance to such vulnerabilities.
We also consider labeled SATAs, which include descriptive properties such as that a SATA is an address for a news organization, a site belonging to a particular government or company, a site with information about a certain topic, etc. Our logic addresses both trust in the bound together identity of the address and trust in the binding of labels to it. Our logic allows reasoning about delegation of trust with respect to specified labels, relationships between labels that provide more or less specific information, and the interaction between these two aspects.
In addition to soundness, we prove that if a principal trusts a particular identity (possibly with label), then either this trust is initially assumed, or there is a trust chain of delegations to this from initial trust assumptions. We also present an algorithm that effectively derives all possible trust statements from the set of initial trust assumptions and show it to be sound, complete, and terminating.

[151]  arXiv:2405.01810 [pdf, other]
Title: Non-linear Welfare-Aware Strategic Learning
Authors: Tian Xie, Xueru Zhang
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper studies algorithmic decision-making in the presence of strategic individual behaviors, where an ML model is used to make decisions about human agents and the latter can adapt their behavior strategically to improve their future data. Existing results on strategic learning have largely focused on the linear setting where agents with linear labeling functions best respond to a (noisy) linear decision policy. Instead, this work focuses on general non-linear settings where agents respond to the decision policy with only "local information" of the policy. Moreover, we simultaneously consider the objectives of maximizing decision-maker welfare (model prediction accuracy), social welfare (agent improvement caused by strategic behaviors), and agent welfare (the extent that ML underestimates the agents). We first generalize the agent best response model in previous works to the non-linear setting, then reveal the compatibility of welfare objectives. We show the three welfare can attain the optimum simultaneously only under restrictive conditions which are challenging to achieve in non-linear settings. The theoretical results imply that existing works solely maximizing the welfare of a subset of parties inevitably diminish the welfare of the others. We thus claim the necessity of balancing the welfare of each party in non-linear settings and propose an irreducible optimization algorithm suitable for general strategic learning. Experiments on synthetic and real data validate the proposed algorithm.

[152]  arXiv:2405.01813 [pdf, other]
Title: Towards Building Autonomous Data Services on Azure
Comments: SIGMOD Companion of the 2023 International Conference on Management of Data. 2023
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. This paper presents our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention.

[153]  arXiv:2405.01814 [pdf, other]
Title: Efficient and Economic Large Language Model Inference with Attention Offloading
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators. This mismatch arises from the autoregressive nature of LLMs, where the generation phase comprises operators with varying resource demands. Specifically, the attention operator is memory-intensive, exhibiting a memory access pattern that clashes with the strengths of modern accelerators, especially as context length increases. To enhance the efficiency and cost-effectiveness of LLM serving, we introduce the concept of attention offloading. This approach leverages a collection of cheap, memory-optimized devices for the attention operator while still utilizing high-end accelerators for other parts of the model. This heterogeneous setup ensures that each component is tailored to its specific workload, maximizing overall performance and cost efficiency. Our comprehensive analysis and experiments confirm the viability of splitting the attention computation over multiple devices. Also, the communication bandwidth required between heterogeneous devices proves to be manageable with prevalent networking technologies. To further validate our theory, we develop Lamina, an LLM inference system that incorporates attention offloading. Experimental results indicate that Lamina can provide 1.48x-12.1x higher estimated throughput per dollar than homogeneous solutions.

[154]  arXiv:2405.01815 [pdf, other]
Title: Toward end-to-end interpretable convolutional neural networks for waveform signals
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent. It can potentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while remaining lightweight. Furthermore, we demonstrate the efficiency and interpretability of the front-end layer using the PhysioNet Heart Sound Database, illustrating its ability to handle and capture intricate long waveform patterns. Our contributions offer a portable solution for building efficient and interpretable models for raw waveform data.

[155]  arXiv:2405.01817 [pdf, other]
Title: Uniformly Stable Algorithms for Adversarial Training and Beyond
Comments: ICML 2024
Subjects: Machine Learning (cs.LG)

In adversarial machine learning, neural networks suffer from a significant issue known as robust overfitting, where the robust test accuracy decreases over epochs (Rice et al., 2020). Recent research conducted by Xing et al.,2021; Xiao et al., 2022 has focused on studying the uniform stability of adversarial training. Their investigations revealed that SGD-based adversarial training fails to exhibit uniform stability, and the derived stability bounds align with the observed phenomenon of robust overfitting in experiments. This motivates us to develop uniformly stable algorithms specifically tailored for adversarial training. To this aim, we introduce Moreau envelope-$\mathcal{A}$, a variant of the Moreau Envelope-type algorithm. We employ a Moreau envelope function to reframe the original problem as a min-min problem, separating the non-strong convexity and non-smoothness of the adversarial loss. Then, this approach alternates between solving the inner and outer minimization problems to achieve uniform stability without incurring additional computational overhead. In practical scenarios, we show the efficacy of ME-$\mathcal{A}$ in mitigating the issue of robust overfitting. Beyond its application in adversarial training, this represents a fundamental result in uniform stability analysis, as ME-$\mathcal{A}$ is the first algorithm to exhibit uniform stability for weakly-convex, non-smooth problems.

[156]  arXiv:2405.01819 [pdf, other]
Title: Sequencer Level Security
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Current blockchains do not provide any security guarantees to the smart contracts and their users as far as the content of the transactions is concerned. In the spirit of decentralization and censorship resistance, they follow the paradigm of including valid transactions in blocks without any further scrutiny. Rollups are a special kind of blockchains whose primary purpose is to scale the transaction throughput. Many of the existing rollups operate through a centrally operated sequencing protocol. In this paper, we introduce the Sequencer Level Security (SLS) protocol, an enhancement to sequencing protocols of rollups. This pioneering contribution explores the concept of the sequencer's capability to identify and temporarily quarantine malicious transactions instead of including them in blocks immediately. We describe the mechanics of the protocol for both the transactions submitted to the rollup mempool, as well as transactions originating from Layer one. We comment on topics such as trust and decentralization, and consider the security impact on the protocol itself. We implement a prototype of the SLS protocol, Zircuit, which is built on top of Geth and the OP stack. The SLS protocol described can be easily generalized to other rollup designs, and can be used for purposes other than security.

[157]  arXiv:2405.01820 [pdf, ps, other]
Title: Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent Circumvention
Journal-ref: FAccT '24, June 03--06, 2024, Rio de Janeiro, Brazil
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Machine learning systems require representations of the real world for training and testing - they require data, and lots of it. Collecting data at scale has logistical and ethical challenges, and synthetic data promises a solution to these challenges. Instead of needing to collect photos of real people's faces to train a facial recognition system, a model creator could create and use photo-realistic, synthetic faces. The comparative ease of generating this synthetic data rather than relying on collecting data has made it a common practice. We present two key risks of using synthetic data in model development. First, we detail the high risk of false confidence when using synthetic data to increase dataset diversity and representation. We base this in the examination of a real world use-case of synthetic data, where synthetic datasets were generated for an evaluation of facial recognition technology. Second, we examine how using synthetic data risks circumventing consent for data usage. We illustrate this by considering the importance of consent to the U.S. Federal Trade Commission's regulation of data collection and affected models. Finally, we discuss how these two risks exemplify how synthetic data complicates existing governance and ethical practice; by decoupling data from those it impacts, synthetic data is prone to consolidating power away those most impacted by algorithmically-mediated harm.

[158]  arXiv:2405.01824 [pdf, other]
Title: Creation of Novel Soft Robot Designs using Generative AI
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Soft robotics has emerged as a promising field with the potential to revolutionize industries such as healthcare and manufacturing. However, designing effective soft robots presents challenges, particularly in managing the complex interplay of material properties, structural design, and control strategies. Traditional design methods are often time-consuming and may not yield optimal designs. In this paper, we explore the use of generative AI to create 3D models of soft actuators. We create a dataset of over 70 text-shape pairings of soft pneumatic robot actuator designs, and adapt a latent diffusion model (SDFusion) to learn the data distribution and generate novel designs from it. By employing transfer learning and data augmentation techniques, we significantly improve the performance of the diffusion model. These findings highlight the potential of generative AI in designing complex soft robotic systems, paving the way for future advancements in the field.

[159]  arXiv:2405.01825 [pdf, other]
Title: Improving Concept Alignment in Vision-Language Concept Bottleneck Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Concept Bottleneck Models (CBM) map the input image to a high-level human-understandable concept space and then make class predictions based on these concepts. Recent approaches automate the construction of CBM by prompting Large Language Models (LLM) to generate text concepts and then use Vision Language Models (VLM) to obtain concept scores to train a CBM. However, it is desired to build CBMs with concepts defined by human experts instead of LLM generated concepts to make them more trustworthy. In this work, we take a closer inspection on the faithfulness of VLM concept scores for such expert-defined concepts in domains like fine-grain bird species classification and animal classification. Our investigations reveal that frozen VLMs, like CLIP, struggle to correctly associate a concept to the corresponding visual input despite achieving a high classification performance. To address this, we propose a novel Contrastive Semi-Supervised (CSS) learning method which uses a few labeled concept examples to improve concept alignment (activate truthful visual concepts) in CLIP model. Extensive experiments on three benchmark datasets show that our approach substantially increases the concept accuracy and classification accuracy, yet requires only a fraction of the human-annotated concept labels. To further improve the classification performance, we also introduce a new class-level intervention procedure for fine-grain classification problems that identifies the confounding classes and intervenes their concept space to reduce errors.

[160]  arXiv:2405.01827 [pdf, other]
Title: SoftMCL: Soft Momentum Contrastive Learning for Fine-grained Sentiment-aware Pre-training
Comments: Accepted by LREC-COLING 2024
Subjects: Computation and Language (cs.CL)

The pre-training for language models captures general language understanding but fails to distinguish the affective impact of a particular context to a specific word. Recent works have sought to introduce contrastive learning (CL) for sentiment-aware pre-training in acquiring affective information. Nevertheless, these methods present two significant limitations. First, the compatibility of the GPU memory often limits the number of negative samples, hindering the opportunities to learn good representations. In addition, using only a few sentiment polarities as hard labels, e.g., positive, neutral, and negative, to supervise CL will force all representations to converge to a few points, leading to the issue of latent space collapse. This study proposes a soft momentum contrastive learning (SoftMCL) for fine-grained sentiment-aware pre-training. Instead of hard labels, we introduce valence ratings as soft-label supervision for CL to fine-grained measure the sentiment similarities between samples. The proposed SoftMCL is conducted on both the word- and sentence-level to enhance the model's ability to learn affective information. A momentum queue was introduced to expand the contrastive samples, allowing storing and involving more negatives to overcome the limitations of hardware platforms. Extensive experiments were conducted on four different sentiment-related tasks, which demonstrates the effectiveness of the proposed SoftMCL method. The code and data of the proposed SoftMCL is available at: https://www.github.com/wangjin0818/SoftMCL/.

[161]  arXiv:2405.01828 [pdf, other]
Title: FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Facial Expression Recognition (FER) plays a pivotal role in understanding human emotional cues. However, traditional FER methods based on visual information have some limitations, such as preprocessing, feature extraction, and multi-stage classification procedures. These not only increase computational complexity but also require a significant amount of computing resources. Considering Convolutional Neural Network (CNN)-based FER schemes frequently prove inadequate in identifying the deep, long-distance dependencies embedded within facial expression images, and the Transformer's inherent quadratic computational complexity, this paper presents the FER-YOLO-Mamba model, which integrates the principles of Mamba and YOLO technologies to facilitate efficient coordination in facial expression image recognition and localization. Within the FER-YOLO-Mamba model, we further devise a FER-YOLO-VSS dual-branch module, which combines the inherent strengths of convolutional layers in local feature extraction with the exceptional capability of State Space Models (SSMs) in revealing long-distance dependencies. To the best of our knowledge, this is the first Vision Mamba model designed for facial expression detection and classification. To evaluate the performance of the proposed FER-YOLO-Mamba model, we conducted experiments on two benchmark datasets, RAF-DB and SFEW. The experimental results indicate that the FER-YOLO-Mamba model achieved better results compared to other models. The code is available from https://github.com/SwjtuMa/FER-YOLO-Mamba.

[162]  arXiv:2405.01838 [pdf, other]
Title: A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion
Subjects: Machine Learning (cs.LG)

Recent developments in adversarial machine learning have highlighted the importance of building robust AI systems to protect against increasingly sophisticated attacks. While frameworks like AI Guardian are designed to defend against these threats, they often rely on assumptions that can limit their effectiveness. For example, they may assume attacks only come from one direction or include adversarial images in their training data. Our proposal suggests a different approach to the AI Guardian framework. Instead of including adversarial examples in the training process, we propose training the AI system without them. This aims to create a system that is inherently resilient to a wider range of attacks. Our method focuses on a dynamic defense strategy using stable diffusion that learns continuously and models threats comprehensively. We believe this approach can lead to a more generalized and robust defense against adversarial attacks.
In this paper, we outline our proposed approach, including the theoretical basis, experimental design, and expected impact on improving AI security against adversarial threats.

[163]  arXiv:2405.01839 [pdf, other]
Title: SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning
Comments: AAAI 2024 Cooperative Multi-Agent Systems Decision-Making and Learning (CMASDL) Workshop
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.

[164]  arXiv:2405.01840 [pdf, ps, other]
Title: An Essay concerning machine understanding
Subjects: Artificial Intelligence (cs.AI)

Artificial intelligence systems exhibit many useful capabilities, but they appear to lack understanding. This essay describes how we could go about constructing a machine capable of understanding. As John Locke (1689) pointed out words are signs for ideas, which we can paraphrase as thoughts and concepts. To understand a word is to know and be able to work with the underlying concepts for which it is an indicator. Understanding between a speaker and a listener occurs when the speaker casts his or her concepts into words and the listener recovers approximately those same concepts. Current models rely on the listener to construct any potential meaning. The diminution of behaviorism as a psychological paradigm and the rise of cognitivism provide examples of many experimental methods that can be used to determine whether and to what extent a machine might understand and to make suggestions about how that understanding might be instantiated.

[165]  arXiv:2405.01842 [pdf, ps, other]
Title: SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
Subjects: Computation and Language (cs.CL)

To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts.

[166]  arXiv:2405.01843 [pdf, ps, other]
Title: Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization
Comments: arXiv admin note: text overlap with arXiv:2306.10486
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \textbf{M}arkovian sampling, \textbf{C}ontinuous state-action spaces, the performance of the \textbf{L}ast iterate, and \textbf{G}lobal optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$. We achieve this result through our novel use of the weak gradient domination property of MDP's and our unique analysis of the error in critic estimation.

[167]  arXiv:2405.01844 [pdf, other]
Title: A Survey on Privacy-Preserving Caching at Network Edge: Classification, Solutions, and Challenges
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Caching content at the network edge is a popular and effective technique widely deployed to alleviate the burden of network backhaul, shorten service delay and improve service quality. However, there has been some controversy over privacy violations in caching content at the network edge. On the one hand, the multi-access open edge network provides an ideal surface for external attackers to obtain private data from the edge cache by extracting sensitive information. On the other hand, privacy can be infringed by curious edge caching providers through caching trace analysis targeting to achieve better caching performance or higher profits. Therefore, an in-depth understanding of privacy issues in edge caching networks is vital and indispensable for creating a privacy-preserving caching service at the network edge. In this article, we are among the first to fill in this gap by examining privacy-preserving techniques for caching content at the network edge. Firstly, we provide an introduction to the background of Privacy-Preserving Edge Caching (PPEC). Next, we summarize the key privacy issues and present a taxonomy for caching at the network edge from the perspective of private data. Additionally, we conduct a retrospective review of the state-of-the-art countermeasures against privacy leakage from content caching at the network edge. Finally, we conclude the survey and envision challenges for future research.

[168]  arXiv:2405.01847 [pdf, other]
Title: A Model-based Multi-Agent Personalized Short-Video Recommender System
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.

[169]  arXiv:2405.01848 [pdf, other]
Title: RankSHAP: a Gold Standard Feature Attribution Method for the Ranking Task
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Several works propose various post-hoc, model-agnostic explanations for the task of ranking, i.e. the task of ordering a set of documents, via feature attribution methods. However, these attributions are seen to weakly correlate and sometimes contradict each other. In classification/regression, several works focus on \emph{axiomatic characterization} of feature attribution methods, showing that a certain method uniquely satisfies a set of desirable properties. However, no such efforts have been taken in the space of feature attributions for the task of ranking. We take an axiomatic game-theoretic approach, popular in the feature attribution community, to identify candidate attribution methods for ranking tasks. We first define desirable axioms: Rank-Efficiency, Rank-Missingness, Rank-Symmetry and Rank-Monotonicity, all variants of the classical Shapley axioms. Next, we introduce Rank-SHAP, a feature attribution algorithm for the general ranking task, which is an extension to classical Shapley values. We identify a polynomial-time algorithm for computing approximate Rank-SHAP values and evaluate the computational efficiency and accuracy of our algorithm under various scenarios. We also evaluate its alignment with human intuition with a user study. Lastly, we theoretically examine popular rank attribution algorithms, EXS and Rank-LIME, and evaluate their capacity to satisfy the classical Shapley axioms.

[170]  arXiv:2405.01849 [pdf, ps, other]
Title: Stability of Explainable Recommendation
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Explainable Recommendation has been gaining attention over the last few years in industry and academia. Explanations provided along with recommendations in a recommender system framework have many uses: particularly reasoning why a suggestion is provided and how well an item aligns with a user's personalized preferences. Hence, explanations can play a huge role in influencing users to purchase products. However, the reliability of the explanations under varying scenarios has not been strictly verified from an empirical perspective. Unreliable explanations can bear strong consequences such as attackers leveraging explanations for manipulating and tempting users to purchase target items that the attackers would want to promote. In this paper, we study the vulnerability of existent feature-oriented explainable recommenders, particularly analyzing their performance under different levels of external noises added into model parameters. We conducted experiments by analyzing three important state-of-the-art (SOTA) explainable recommenders when trained on two widely used e-commerce based recommendation datasets of different scales. We observe that all the explainable models are vulnerable to increased noise levels. Experimental results verify our hypothesis that the ability to explain recommendations does decrease along with increasing noise levels and particularly adversarial noise does contribute to a much stronger decrease. Our study presents an empirical verification on the topic of robust explanations in recommender systems which can be extended to different types of explainable recommenders in RS.

[171]  arXiv:2405.01851 [pdf, other]
Title: Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.

[172]  arXiv:2405.01852 [pdf, ps, other]
Title: Tokenization of Real Estate Assets Using Blockchain
Journal-ref: IJIIT vol.18, no.3 2022: pp.1-12.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Emerging Technologies (cs.ET)

Blockchain technology is one of the key technologies that have revolutionized various facets of society, such as the banking, healthcare, and other critical ecosystems. One area that can harness the usage of blockchain is the real estate sector. The most lucrative long-term investment is real estate, followed by gold, equities, mutual funds, and savings accounts. Nevertheless, it has administrative overheads such as lack of transparency, fraud, several intermediaries, title issues, paperwork, an increasing number of arbitrations, and the lack of liquidity. This paper proposes a framework that uses blockchain as an underlying technology. With the aid of blockchain and the suite of tools, it supports many of these problems that can be alleviated in the real estate investment ecosystem. These include smart contracts, immutable record management, tokenization, record tracking, and time-stamped storage. Tokenization of real estate lowers the entry barrier by fixing liquidity and interoperability and improving the interaction between various stakeholders.

[173]  arXiv:2405.01855 [pdf, ps, other]
Title: Robust Explainable Recommendation
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Explainable Recommender Systems is an important field of study which provides reasons behind the suggested recommendations. Explanations with recommender systems are useful for developers while debugging anomalies within the system and for consumers while interpreting the model's effectiveness in capturing their true preferences towards items. However, most of the existing state-of-the-art (SOTA) explainable recommenders could not retain their explanation capability under noisy circumstances and moreover are not generalizable across different datasets. The robustness of the explanations must be ensured so that certain malicious attackers do not manipulate any high-stake decision scenarios to their advantage, which could cause severe consequences affecting large groups of interest. In this work, we present a general framework for feature-aware explainable recommenders that can withstand external attacks and provide robust and generalized explanations. This paper presents a novel framework which could be utilized as an additional defense tool, preserving the global explainability when subject to model-based white box attacks. Our framework is simple to implement and supports different methods regardless of the internal model structure and intrinsic utility within any model. We experimented our framework on two architecturally different feature-based SOTA explainable algorithms by training them on three popular e-commerce datasets of increasing scales. We noticed that both the algorithms displayed an overall improvement in the quality and robustness of the global explainability under normal as well as noisy environments across all the datasets, indicating the flexibility and mutability of our framework.

[174]  arXiv:2405.01857 [pdf, other]
Title: TinySeg: Model Optimizing Framework for Image Segmentation on Tiny Embedded Systems
Comments: LCTES 2024
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

Image segmentation is one of the major computer vision tasks, which is applicable in a variety of domains, such as autonomous navigation of an unmanned aerial vehicle. However, image segmentation cannot easily materialize on tiny embedded systems because image segmentation models generally have high peak memory usage due to their architectural characteristics. This work finds that image segmentation models unnecessarily require large memory space with an existing tiny machine learning framework. That is, the existing framework cannot effectively manage the memory space for the image segmentation models.
This work proposes TinySeg, a new model optimizing framework that enables memory-efficient image segmentation for tiny embedded systems. TinySeg analyzes the lifetimes of tensors in the target model and identifies long-living tensors. Then, TinySeg optimizes the memory usage of the target model mainly with two methods: (i) tensor spilling into local or remote storage and (ii) fused fetching of spilled tensors. This work implements TinySeg on top of the existing tiny machine learning framework and demonstrates that TinySeg can reduce the peak memory usage of an image segmentation model by 39.3% for tiny embedded systems.

[175]  arXiv:2405.01858 [pdf, other]
Title: SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

Sexual education aims to foster a healthy lifestyle in terms of emotional, mental and social well-being. In countries like India, where adolescents form the largest demographic group, they face significant vulnerabilities concerning sexual health. Unfortunately, sexual education is often stigmatized, creating barriers to providing essential counseling and information to this at-risk population. Consequently, issues such as early pregnancy, unsafe abortions, sexually transmitted infections, and sexual violence become prevalent. Our current proposal aims to provide a safe and trustworthy platform for sexual education to the vulnerable rural Indian population, thereby fostering the healthy and overall growth of the nation. In this regard, we strive towards designing SUKHSANDESH, a multi-staged AI-based Question Answering platform for sexual education tailored to rural India, adhering to safety guardrails and regional language support. By utilizing information retrieval techniques and large language models, SUKHSANDESH will deliver effective responses to user queries. We also propose to anonymise the dataset to mitigate safety measures and set AI guardrails against any harmful or unwanted response generation. Moreover, an innovative feature of our proposal involves integrating ``avatar therapy'' with SUKHSANDESH. This feature will convert AI-generated responses into real-time audio delivered by an animated avatar speaking regional Indian languages. This approach aims to foster empathy and connection, which is particularly beneficial for individuals with limited literacy skills. Partnering with Gram Vaani, an industry leader, we will deploy SUKHSANDESH to address sexual education needs in rural India.

[176]  arXiv:2405.01859 [pdf, other]
Title: AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research
Comments: 9 pages, in ICML 2024
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

The recent embrace of machine learning (ML) in the development of autonomous weapons systems (AWS) creates serious risks to geopolitical stability and the free exchange of ideas in AI research. This topic has received comparatively little attention of late compared to risks stemming from superintelligent artificial general intelligence (AGI), but requires fewer assumptions about the course of technological development and is thus a nearer-future issue. ML is already enabling the substitution of AWS for human soldiers in many battlefield roles, reducing the upfront human cost, and thus political cost, of waging offensive war. In the case of peer adversaries, this increases the likelihood of "low intensity" conflicts which risk escalation to broader warfare. In the case of non-peer adversaries, it reduces the domestic blowback to wars of aggression. This effect can occur regardless of other ethical issues around the use of military AI such as the risk of civilian casualties, and does not require any superhuman AI capabilities. Further, the military value of AWS raises the specter of an AI-powered arms race and the misguided imposition of national security restrictions on AI research. Our goal in this paper is to raise awareness among the public and ML researchers on the near-future risks posed by full or near-full autonomy in military technology, and we provide regulatory suggestions to mitigate these risks. We call upon AI policy experts and the defense AI community in particular to embrace transparency and caution in their development and deployment of AWS to avoid the negative effects on global stability and AI research that we highlight here.

[177]  arXiv:2405.01867 [pdf, ps, other]
Title: Cyber Security in Energy Informatics: A Non-technical Perspective
Subjects: Cryptography and Security (cs.CR)

Literature in cyber security including cyber security in energy informatics are tecnocentric focuses that may miss the chances of understanding a bigger picture of cyber security measures. This research thus aims to conduct a literature review focusing on non-technical issues in cyber security in the energy informatics field. The findings show that there are seven non-technical issues have been discussed in literature, including education, awareness, policy, standards, human, and risks, challenges, and solutions. These findings can be valuable for not only researchers, but also managers, policy makers, and educators.

[178]  arXiv:2405.01868 [pdf, other]
Title: Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems
Comments: Main paper 8 pages; References and Appendix 9 pages; 7 figures and 14 tables
Subjects: Computation and Language (cs.CL)

This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17% and proactivity by 27%, and achieving a tenfold enhancement in recommendation accuracy.

[179]  arXiv:2405.01870 [pdf, other]
Title: Detecting and Deterring Manipulation in a Cognitive Hierarchy
Comments: 11 pages, 5 figures
Subjects: Multiagent Systems (cs.MA); Computer Science and Game Theory (cs.GT)

Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper reasoning and more sophisticated opponent modelling. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework, $\aleph$-IPOMDP, augmenting model-based RL agents' Bayesian inference with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and zero-sum game. Our results show the $\aleph$ mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.

[180]  arXiv:2405.01872 [pdf, other]
Title: Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To this end, we propose Stable Surface Defect Generation (StableSDG), which transfers the vast generation distribution embedded in Stable Diffusion model for steel surface defect image generation. To tackle with the distinctive distribution gap between steel surface images and generated images of the diffusion model, we propose two processes. First, we align the distribution by adapting parameters of the diffusion model, adopted both in the token embedding space and network parameter space. Besides, in the generation process, we propose image-oriented generation rather than from pure Gaussian noises. We conduct extensive experiments on steel surface defect dataset, demonstrating state-of-the-art performance on generating high-quality samples and training recognition models, and both designed processes are significant for the performance.

[181]  arXiv:2405.01873 [pdf, other]
Title: Enhancing Bangla Language Next Word Prediction and Sentence Completion through Extended RNN with Bi-LSTM Model On N-gram Language
Comments: This paper contains 6 pages, 8 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Texting stands out as the most prominent form of communication worldwide. Individual spend significant amount of time writing whole texts to send emails or write something on social media, which is time consuming in this modern era. Word prediction and sentence completion will be suitable and appropriate in the Bangla language to make textual information easier and more convenient. This paper expands the scope of Bangla language processing by introducing a Bi-LSTM model that effectively handles Bangla next-word prediction and Bangla sentence generation, demonstrating its versatility and potential impact. We proposed a new Bi-LSTM model to predict a following word and complete a sentence. We constructed a corpus dataset from various news portals, including bdnews24, BBC News Bangla, and Prothom Alo. The proposed approach achieved superior results in word prediction, reaching 99\% accuracy for both 4-gram and 5-gram word predictions. Moreover, it demonstrated significant improvement over existing methods, achieving 35\%, 75\%, and 95\% accuracy for uni-gram, bi-gram, and tri-gram word prediction, respectively

[182]  arXiv:2405.01874 [pdf, other]
Title: Automated Control Logic Test Case Generation using Large Language Models
Subjects: Software Engineering (cs.SE)

Testing PLC and DCS control logic in industrial automation is laborious and challenging since appropriate test cases are often complex and difficult to formulate. Researchers have previously proposed several automated test case generation approaches for PLC software applying symbolic execution and search-based techniques. Often requiring formal specifications and performing a mechanical analysis of programs, these approaches may uncover specific programming errors but sometimes suffer from state space explosion and cannot process rather informal specifications. We proposed a novel approach for the automatic generation of PLC test cases that queries a Large Language Model (LLM) to synthesize test cases for code provided in a prompt. Experiments with ten open-source function blocks from the OSCAT automation library showed that the approach is fast, easy to use, and can yield test cases with high statement coverage for low-to-medium complex programs. However, we also found that LLM-generated test cases suffer from erroneous assertions in many cases, which still require manual adaption.

[183]  arXiv:2405.01882 [pdf, other]
Title: Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter challenges in handling sparse point clouds, achieving real-time continuous classification, and coping with limited monitoring ranges when statically mounted. To overcome these limitations, we propose RobHAR, a movable robot-mounted mmWave radar system with lightweight deep neural networks for real-time monitoring of human activities. Specifically, we first propose a sparse point cloud-based global embedding to learn the features of point clouds using the light-PointNet (LPN) backbone. Then, we learn the temporal pattern with a bidirectional lightweight LSTM model (BiLiLSTM). In addition, we implement a transition optimization strategy, integrating the Hidden Markov Model (HMM) with Connectionist Temporal Classification (CTC) to improve the accuracy and robustness of the continuous HAR. Our experiments on three datasets indicate that our method significantly outperforms the previous studies in both discrete and continuous HAR tasks. Finally, we deploy our system on a movable robot-mounted edge computing platform, achieving flexible healthcare monitoring in real-world scenarios.

[184]  arXiv:2405.01883 [pdf, other]
Title: DALLMi: Domain Adaption for LLM-based Multi-label Classifier
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large language models (LLMs) increasingly serve as the backbone for classifying text associated with distinct domains and simultaneously several labels (classes). When encountering domain shifts, e.g., classifier of movie reviews from IMDb to Rotten Tomatoes, adapting such an LLM-based multi-label classifier is challenging due to incomplete label sets at the target domain and daunting training overhead. The existing domain adaptation methods address either image multi-label classifiers or text binary classifiers. In this paper, we design DALLMi, Domain Adaptation Large Language Model interpolator, a first-of-its-kind semi-supervised domain adaptation method for text data models based on LLMs, specifically BERT. The core of DALLMi is the novel variation loss and MixUp regularization, which jointly leverage the limited positively labeled and large quantity of unlabeled text and, importantly, their interpolation from the BERT word embeddings. DALLMi also introduces a label-balanced sampling strategy to overcome the imbalance between labeled and unlabeled data. We evaluate DALLMi against the partial-supervised and unsupervised approach on three datasets under different scenarios of label availability for the target domain. Our results show that DALLMi achieves higher mAP than unsupervised and partially-supervised approaches by 19.9% and 52.2%, respectively.

[185]  arXiv:2405.01884 [pdf, other]
Title: Beyond Single-Event Extraction: Towards Efficient Document-Level Multi-Event Argument Extraction
Subjects: Computation and Language (cs.CL)

Recent mainstream event argument extraction methods process each event in isolation, resulting in inefficient inference and ignoring the correlations among multiple events. To address these limitations, here we propose a multiple-event argument extraction model DEEIA (Dependency-guided Encoding and Event-specific Information Aggregation), capable of extracting arguments from all events within a document simultaneouslyThe proposed DEEIA model employs a multi-event prompt mechanism, comprising DE and EIA modules. The DE module is designed to improve the correlation between prompts and their corresponding event contexts, whereas the EIA module provides event-specific information to improve contextual understanding. Extensive experiments show that our method achieves new state-of-the-art performance on four public datasets (RAMS, WikiEvents, MLEE, and ACE05), while significantly saving the inference time compared to the baselines. Further analyses demonstrate the effectiveness of the proposed modules.

[186]  arXiv:2405.01885 [pdf, other]
Title: Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning
Comments: accepted by IEEE Signal Processing Letters
Journal-ref: IEEE Signal Processing Letters (2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Psychological studies have shown that Micro Gestures (MG) are closely linked to human emotions. MG-based emotion understanding has attracted much attention because it allows for emotion understanding through nonverbal body gestures without relying on identity information (e.g., facial and electrocardiogram data). Therefore, it is essential to recognize MG effectively for advanced emotion understanding. However, existing Micro Gesture Recognition (MGR) methods utilize only a single modality (e.g., RGB or skeleton) while overlooking crucial textual information. In this letter, we propose a simple but effective visual-text contrastive learning solution that utilizes text information for MGR. In addition, instead of using handcrafted prompts for visual-text contrastive learning, we propose a novel module called Adaptive prompting to generate context-aware prompts. The experimental results show that the proposed method achieves state-of-the-art performance on two public datasets. Furthermore, based on an empirical study utilizing the results of MGR for emotion understanding, we demonstrate that using the textual results of MGR significantly improves performance by 6%+ compared to directly using video as input.

[187]  arXiv:2405.01886 [pdf, other]
Title: Aloe: A Family of Fine-tuned Open Healthcare LLMs
Comments: Five appendix
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

As the capabilities of Large Language Models (LLMs) in healthcare and medicine continue to advance, there is a growing need for competitive open-source models that can safeguard public interest. With the increasing availability of highly competitive open base models, the impact of continued pre-training is increasingly uncertain. In this work, we explore the role of instruct tuning, model merging, alignment, red teaming and advanced inference schemes, as means to improve current open models. To that end, we introduce the Aloe family, a set of open medical LLMs highly competitive within its scale range. Aloe models are trained on the current best base models (Mistral, LLaMA 3), using a new custom dataset which combines public data sources improved with synthetic Chain of Thought (CoT). Aloe models undergo an alignment phase, becoming one of the first few policy-aligned open healthcare LLM using Direct Preference Optimization, setting a new standard for ethical performance in healthcare LLMs. Model evaluation expands to include various bias and toxicity datasets, a dedicated red teaming effort, and a much-needed risk assessment for healthcare LLMs. Finally, to explore the limits of current LLMs in inference, we study several advanced prompt engineering strategies to boost performance across benchmarks, yielding state-of-the-art results for open healthcare 7B LLMs, unprecedented at this scale.

[188]  arXiv:2405.01888 [pdf, other]
Title: Securing the Open RAN Infrastructure: Exploring Vulnerabilities in Kubernetes Deployments
Subjects: Cryptography and Security (cs.CR)

In this paper, we investigate the security implications of virtualized and software-based Open Radio Access Network (RAN) systems, specifically focusing on the architecture proposed by the O-RAN ALLIANCE and O-Cloud deployments based on the O-RAN Software Community (OSC) stack and infrastructure. Our key findings are based on a thorough security assessment and static scanning of the OSC Near Real-Time RAN Intelligent Controller (RIC) cluster. We highlight the presence of potential vulnerabilities and misconfigurations in the Kubernetes infrastructure supporting the RIC, also due to the usage of outdated versions of software packages, and provide an estimation of their criticality using various deployment auditing frameworks (e.g., MITRE ATT&CK and the NSA CISA). In addition, we propose methodologies to minimize these issues and harden the Open RAN virtualization infrastructure. These encompass the integration of security evaluation methods into the deployment process, implementing deployment hardening measures, and employing policy-based control for RAN components. We emphasize the need to address the problems found in order to improve the overall security of virtualized Open RAN systems.

[189]  arXiv:2405.01889 [pdf, ps, other]
Title: Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants
Comments: DAI-Labor of Technische Universit\"at Berlin Master thesis
Subjects: Systems and Control (eess.SY)

The increasing demand for direct electric energy in the grid is also tied to the increase of Electric Vehicle (EV) usage in the cities, which eventually will totally substitute combustion engine Vehicles. Nevertheless, this high amount of energy required, which is stored in the EV batteries, is not always used and it can constitute a virtual power plant on its own. Bidirectional EVs equipped with batteries connected to the grid can therefore charge or discharge energy depending on public needs, producing a smart shift of energy where and when needed. EVs employed as mobile storage devices can add resilience and supply/demand balance benefits to specific loads, in many cases as part of a Microgrid (MG). Depending on the direction of the energy transfer, EVs can provide backup power to households through vehicle-to-house (V2H) charging, or storing unused renewable power through renewable-to-vehicle (RE2V) charging. V2H and RE2V solutions can complement renewable power sources like solar photovoltaic (PV) panels and wind turbines (WT), which fluctuate over time, increasing the self-consumption and autarky. The concept of distributed energy resources (DERs) is becoming more and more present and requires new solutions for the integration of multiple complementary resources with variable supply over time. The development of these ideas is coupled with the growth of new AI techniques that will potentially be the managing core of such systems. Machine learning techniques can model the energy grid environment in such a flexible way that constant optimization is possible. This fascinating working principle introduces the wider concept of an interconnected, shared, decentralized grid of energy. This research on Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants focuses on providing solutions for such energy supply optimization models.

[190]  arXiv:2405.01901 [pdf, ps, other]
Title: AI-generated art perceptions with GenFrame -- an image-generating picture frame
Comments: Design Research Society conference 2024 (DRS2024), Boston 24-28 June 2024
Subjects: Human-Computer Interaction (cs.HC)

Image-generation models are changing how we express ourselves in visual art. However, what people think of AI-generated art is still largely unexplored, especially compared to traditional art. In this paper, we present the design of an interactive research product, GenFrame - an image-generating picture frame that appears as a traditional painting but offers the viewer the agency to modify the depicted painting. In the current paper, we report on a study where we deployed the GenFrame in a traditional art museum and interviewed visitors about their views on AI art. When provoked by AI-generated art, people need more of the artist's backstory and emotional journey to make the artwork commensurate with traditional art. However, generative AI-enabled interactive experiences open new ways of engaging with art when a turn of a dial can modify art styles or motifs on a painting. A demo can be seen here: https://youtu.be/1rhW4fazaBY.

[191]  arXiv:2405.01904 [pdf, other]
Title: Which Identities Are Mobilized: Towards an automated detection of social group appeals in political texts
Subjects: Social and Information Networks (cs.SI); Other Statistics (stat.OT)

This paper proposes a computational text classification strategy to identify references to social groups in European party manifestos and beyond. Our methodology uses machine learning techniques, including BERT and large language models, to capture group-based appeals in texts. We propose to combine automated identification of social groups using the Mistral-7B-v0.1 Large Language Model with Embedding Space-based filtering to extend a sample of core social groups to all social groups mentioned in party manifestos. By applying this approach to RRP's and mainstream parties' group images in manifestos, we explore whether electoral dynamics explain similarities in group appeals and potential convergence or divergence in party strategies. Contrary to expectations, increasing RRP support or mainstream parties' vote loss does not necessarily lead to convergence in group appeals. Nonetheless, our methodology enables mapping similarities in group appeals across time and space in 15 European countries from 1980 to 2021 and can be transferred to other use cases as well.

[192]  arXiv:2405.01905 [pdf, ps, other]
Title: Schwarz Methods for Nonlocal Problems
Comments: 29 pages, 9 figures
Subjects: Numerical Analysis (math.NA)

The first domain decomposition methods for partial differential equations were already developed in 1870 by H. A. Schwarz. Here we consider a nonlocal Dirichlet problem with variable coefficients, where a nonlocal diffusion operator is used. We find that domain decomposition methods like the so-called Schwarz methods seem to be a natural way to solve these nonlocal problems. In this work we show the convergence for nonlocal problems, where specific symmetric kernels are employed, and present the implementation of the multiplicative and additive Schwarz algorithms in the above mentioned nonlocal setting.

[193]  arXiv:2405.01906 [pdf, other]
Title: Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization
Comments: 17 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The neural combinatorial optimization (NCO) approach has shown great potential for solving routing problems without the requirement of expert knowledge. However, existing constructive NCO methods cannot directly solve large-scale instances, which significantly limits their application prospects. To address these crucial shortcomings, this work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural combinatorial optimization. In particular, we design a powerful yet lightweight instance-conditioned adaptation module for the NCO model to generate better solutions for instances across different scales. In addition, we develop an efficient three-stage reinforcement learning-based training scheme that enables the model to learn cross-scale features without any labeled optimal solution. Experimental results show that our proposed method is capable of obtaining excellent results with a very fast inference time in solving Traveling Salesman Problems (TSPs) and Capacitated Vehicle Routing Problems (CVRPs) across different scales. To the best of our knowledge, our model achieves state-of-the-art performance among all RL-based constructive methods for TSP and CVRP with up to 1,000 nodes.

[194]  arXiv:2405.01909 [pdf, other]
Title: Towards Sustainable Low Carbon Emission Mini Data Centres
Authors: Ismael Samaye (LIRMM | ADAC), Paul Leloup (LIRMM), Gilles Sassatelli (LIRMM | ADAC), Abdoulaye Gamatié (LIRMM | ADAC)
Journal-ref: ComPAS 2023 - Conf{\'e}rence francophone d'informatique en Parall{\'e}lisme, Architecture et Syst{\`e}me, Jul 2023, Annecy, France
Subjects: Hardware Architecture (cs.AR)

Mini data centres have become increasingly prevalent in diverse organizations in recent years. They can be easily deployed at large scale, with high resilience. They are also cost-effective and provide highsecurity protection. On the other hand, IT technologies have resulted in the development of ever more energy-efficient servers, leading to the periodic replacement of older-generation servers in mini data centres. However, the disposal of older servers has resulted in electronic waste that further aggravates the already critical e-waste problem. Furthermore, despite the shift towards more energy-efficient servers, many mini data centres still rely heavily on high-carbon energy sources. This contributes to data centres' overall carbon footprint. All these issues are concerns for sustainability. In order to address this sustainability issue, this paper proposes an approach to extend the lifespan of older-generation servers in mini data centres. This is made possible thanks to a novel solar-powered computing technology, named Genesis, that compensates for the energy overhead generated by older servers. As a result, electronic waste can be reduced while improving system sustainability by reusing functional server hardware. Moreover, Genesis does not require server cooling, which reduces energy and water requirements. Analytical reasoning is applied to compare the efficiency of typical conventional mini data centre designs against alternative Genesis-based designs, in terms of energy, carbon emissions and exploitation costs.

[195]  arXiv:2405.01916 [pdf, other]
Title: Multi-objective Optimal Trade-off Between V2G Activities and Battery Degradation in Electric Mobility-as-a-Service Systems
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper presents optimization models for electric Mobility-as-a-Service systems, whereby electric vehicles not only provide on-demand mobility, but also perform charging and Vehicle-to-Grid (V2G) operations to enhance the fleet operator profitability. Specifically, we formulate the optimal fleet operation problem as a mixed-integer linear program, with the objective combining of operational costs and revenues generated from servicing requests and grid electricity sales. Our cost function explicitly captures battery price and degradation, reflecting their impact on the fleet total cost of ownership due to additional charging and discharging activities. Simulation results for Eindhoven, The Netherlands, show that integrating V2G activities does not compromise the number of travel requests being served. Moreover, we emphasize the significance of accounting for battery degradation, as the costs associated with it can potentially outweigh the revenues stemming from V2G operations.

[196]  arXiv:2405.01917 [pdf, other]
Title: A comparison of online search engine autocompletion in Google and Baidu
Subjects: Computers and Society (cs.CY)

Warning: This paper contains content that may be offensive or upsetting. Online search engine auto-completions make it faster for users to search and access information. However, they also have the potential to reinforce and promote stereotypes and negative opinions about a variety of social groups. We study the characteristics of search auto-completions in two different linguistic and cultural contexts: Baidu and Google. We find differences between the two search engines in the way they suppress or modify original queries, and we highlight a concerning presence of negative suggestions across all social groups. Our study highlights the need for more refined, culturally sensitive moderation strategies in current language technologies.

[197]  arXiv:2405.01918 [pdf, other]
Title: An Onboard Framework for Staircases Modeling Based on Point Clouds
Subjects: Robotics (cs.RO)

The detection of traversable regions on staircases and the physical modeling constitutes pivotal aspects of the mobility of legged robots. This paper presents an onboard framework tailored to the detection of traversable regions and the modeling of physical attributes of staircases by point cloud data. To mitigate the influence of illumination variations and the overfitting due to the dataset diversity, a series of data augmentations are introduced to enhance the training of the fundamental network. A curvature suppression cross-entropy(CSCE) loss is proposed to reduce the ambiguity of prediction on the boundary between traversable and non-traversable regions. Moreover, a measurement correction based on the pose estimation of stairs is introduced to calibrate the output of raw modeling that is influenced by tilted perspectives. Lastly, we collect a dataset pertaining to staircases and introduce new evaluation criteria. Through a series of rigorous experiments conducted on this dataset, we substantiate the superior accuracy and generalization capabilities of our proposed method. Codes, models, and datasets will be available at https://github.com/szturobotics/Stair-detection-and-modeling-project.

[198]  arXiv:2405.01919 [pdf, ps, other]
Title: Channel Orthogonalization in Panel-Based LIS
Comments: 6 pages, 3 figures. This work has been submitted to the IEEE for possible publication, copyright information may be affected upon publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Large intelligent surface (LIS) has gained momentum as a potential 6G-enabling technology that expands the benefits of massive multiple-input multiple-output (MIMO). On the other hand, orthogonal space-division multiplexing (OSDM) may give a promising direction for efficient exploitation of the spatial resources, analogous as what is achieved with orthogonal frequency-division multiplexing (OFDM) in the frequency domain. To this end, we study how to enforce channels orthogonality in a panel-based LIS scenario. Our proposed method consists of having a subset of active LIS-panels coherently serving a set of users, and another subset of LIS-panels operating in semi-passive mode by implementing a receive and re-transmit (RRTx) process. This results in an inter-symbol interference (ISI) channel, where we characterize the semi-passive processing required to achieve simultaneous orthogonality in time and space. We then employ the remaining degrees of freedom (DoFs) from the orthogonality constraint to minimize the semi-passive processing power, where we derive a closed-form global minimizer, allowing for efficient implementation of the proposed scheme.

[199]  arXiv:2405.01920 [pdf, ps, other]
Title: Lightweight Change Detection in Heterogeneous Remote Sensing Images with Online All-Integer Pruning Training
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Detection of changes in heterogeneous remote sensing images is vital, especially in response to emergencies like earthquakes and floods. Current homogenous transformation-based change detection (CD) methods often suffer from high computation and memory costs, which are not friendly to edge-computation devices like onboard CD devices at satellites. To address this issue, this paper proposes a new lightweight CD method for heterogeneous remote sensing images that employs the online all-integer pruning (OAIP) training strategy to efficiently fine-tune the CD network using the current test data. The proposed CD network consists of two visual geometry group (VGG) subnetworks as the backbone architecture. In the OAIP-based training process, all the weights, gradients, and intermediate data are quantized to integers to speed up training and reduce memory usage, where the per-layer block exponentiation scaling scheme is utilized to reduce the computation errors of network parameters caused by quantization. Second, an adaptive filter-level pruning method based on the L1-norm criterion is employed to further lighten the fine-tuning process of the CD network. Experimental results show that the proposed OAIP-based method attains similar detection performance (but with significantly reduced computation complexity and memory usage) in comparison with state-of-the-art CD methods.

[200]  arXiv:2405.01923 [pdf, other]
Title: Task-Driven Computational Framework for Simultaneously Optimizing Design and Mounted Pose of Modular Reconfigurable Manipulators
Subjects: Robotics (cs.RO)

Modular reconfigurable manipulators enable quick adaptation and versatility to address different application environments and tailor to the specific requirements of the tasks. Task performance significantly depends on the manipulator's mounted pose and morphology design, therefore posing the need of methodologies for selecting suitable modular robot configurations and mounted pose that can address the specific task requirements and required performance. Morphological changes in modular robots can be derived through a discrete optimization process involving the selective addition or removal of modules. In contrast, the adjustment of the mounted pose operates within a continuous space, allowing for smooth and precise alterations in both orientation and position. This work introduces a computational framework that simultaneously optimizes modular manipulators' mounted pose and morphology. The core of the work is that we design a mapping function that \textit{implicitly} captures the morphological state of manipulators in the continuous space. This transformation function unifies the optimization of mounted pose and morphology within a continuous space. Furthermore, our optimization framework incorporates a array of performance metrics, such as minimum joint effort and maximum manipulability, and considerations for trajectory execution error and physical and safety constraints. To highlight our method's benefits, we compare it with previous methods that framed such problem as a combinatorial optimization problem and demonstrate its practicality in selecting the modular robot configuration for executing a drilling task with the CONCERT modular robotic platform.

[201]  arXiv:2405.01924 [pdf, other]
Title: Semi-Parametric Retrieval via Binary Token Index
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

The landscape of information retrieval has broadened from search services to a critical component in various advanced applications, where indexing efficiency, cost-effectiveness, and freshness are increasingly important yet remain less explored. To address these demands, we introduce Semi-parametric Vocabulary Disentangled Retrieval (SVDR). SVDR is a novel semi-parametric retrieval framework that supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval. In our evaluation on three open-domain question answering benchmarks with the entire Wikipedia as the retrieval corpus, SVDR consistently demonstrates superiority. It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and an 9% higher top-1 accuracy compared to BM25 when using a binary token index. Specifically, the adoption of a binary token index reduces index preparation time from 30 GPU hours to just 2 CPU hours and storage size from 31 GB to 2 GB, achieving a 90% reduction compared to an embedding-based index.

[202]  arXiv:2405.01925 [pdf, other]
Title: A Modular, Tendon Driven Variable Stiffness Manipulator with Internal Routing for Improved Stability and Increased Payload Capacity
Comments: To be presented at ICRA 2024, Yokohama, Japan. 6 pages
Subjects: Robotics (cs.RO)

Stability and reliable operation under a spectrum of environmental conditions is still an open challenge for soft and continuum style manipulators. The inability to carry sufficient load and effectively reject external disturbances are two drawbacks which limit the scale of continuum designs, preventing widespread adoption of this technology. To tackle these problems, this work details the design and experimental testing of a modular, tendon driven bead-style continuum manipulator with tunable stiffness. By embedding the ability to independently control the stiffness of distinct sections of the structure, the manipulator can regulate it's posture under greater loads of up to 1kg at the end-effector, with reference to the flexible state. Likewise, an internal routing scheme vastly improves the stability of the proximal segment when operating the distal segment, reducing deviations by at least 70.11%. Operation is validated when gravity is both tangential and perpendicular to the manipulator backbone, a feature uncommon in previous designs. The findings presented in this work are key to the development of larger scale continuum designs, demonstrating that flexibility and tip stability under loading can co-exist without compromise.

[203]  arXiv:2405.01926 [pdf, other]
Title: Auto-Encoding Morph-Tokens for Multimodal LLM
Comments: Accepted by ICML 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge. This is due to a conflicting objective: for comprehension, an MLLM needs to abstract the visuals; for generation, it needs to preserve the visuals as much as possible. Thus, the objective is a dilemma for visual-tokens. To resolve the conflict, we propose encoding images into morph-tokens to serve a dual purpose: for comprehension, they act as visual prompts instructing MLLM to generate texts; for generation, they take on a different, non-conflicting role as complete visual-tokens for image reconstruction, where the missing visual cues are recovered by the MLLM. Extensive experiments show that morph-tokens can achieve a new SOTA for multimodal comprehension and generation simultaneously. Our project is available at https://github.com/DCDmllm/MorphTokens.

[204]  arXiv:2405.01927 [pdf, other]
Title: SlotGAT: Slot-based Message Passing for Heterogeneous Graph Neural Network
Comments: Published as a conference paper at ICML 2023
Subjects: Machine Learning (cs.LG)

Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though the neighbors are in different types. That is, the semantics in different node types are entangled together into node $v$'s representation. To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. Moreover, in a slot-based message passing layer, we design an attention mechanism for effective slot-wise message aggregation. Further, we develop a slot attention technique after the last layer of SlotGAT, to learn the importance of different slots in downstream tasks. Our analysis indicates that the slots in SlotGAT can preserve different semantics in various feature spaces. The superiority of SlotGAT is evaluated against 13 baselines on 6 datasets for node classification and link prediction. Our code is at https://github.com/scottjiao/SlotGAT_ICML23/.

[205]  arXiv:2405.01929 [pdf, other]
Title: Novel Local Characteristic Decomposition Based Path-Conservative Central-Upwind Schemes
Comments: arXiv admin note: text overlap with arXiv:2307.16380
Subjects: Numerical Analysis (math.NA)

We introduce local characteristic decomposition based path-conservative central-upwind schemes for (nonconservative) hyperbolic systems of balance laws. The proposed schemes are made to be well-balanced via a flux globalization approach, in which source terms are incorporated into the fluxes: This helps to enforce the well-balanced property when the resulting quasi-conservative system is solved using the local characteristic decomposition based central-upwind scheme recently introduced in [{\sc A. Chertock, S. Chu, M. Herty, A. Kurganov, and M. Luk\'{a}\v{c}ov\'{a}-Medvi{\softd}ov\'{a}}, J. Comput. Phys., 473 (2023), Paper No. 111718]. Nonconservative product terms are also incorporated into the global fluxes using a path-conservative technique. We illustrate the performance of the developed schemes by applying them to one- and two-dimensional compressible multifluid systems and thermal rotating shallow water equations.

[206]  arXiv:2405.01930 [pdf, other]
Title: OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources
Subjects: Computation and Language (cs.CL)

This paper introduces OARelatedWork, the first large-scale multi-document summarization dataset for related work generation containing whole related work sections and full-texts of cited papers. The dataset includes 94 450 papers and 5 824 689 unique referenced papers. It was designed for the task of automatically generating related work to shift the field toward generating entire related work sections from all available content instead of generating parts of related work sections from abstracts only, which is the current mainstream in this field for abstractive approaches. We show that the estimated upper bound for extractive summarization increases by 217% in the ROUGE-2 score, when using full content instead of abstracts. Furthermore, we show the benefits of full content data on naive, oracle, traditional, and transformer-based baselines. Long outputs, such as related work sections, pose challenges for automatic evaluation metrics like BERTScore due to their limited input length. We tackle this issue by proposing and evaluating a meta-metric using BERTScore. Despite operating on smaller blocks, we show this meta-metric correlates with human judgment, comparably to the original BERTScore.

[207]  arXiv:2405.01934 [pdf, other]
Title: Impact of Architectural Modifications on Deep Learning Adversarial Robustness
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Rapid advancements of deep learning are accelerating adoption in a wide variety of applications, including safety-critical applications such as self-driving vehicles, drones, robots, and surveillance systems. These advancements include applying variations of sophisticated techniques that improve the performance of models. However, such models are not immune to adversarial manipulations, which can cause the system to misbehave and remain unnoticed by experts. The frequency of modifications to existing deep learning models necessitates thorough analysis to determine the impact on models' robustness. In this work, we present an experimental evaluation of the effects of model modifications on deep learning model robustness using adversarial attacks. Our methodology involves examining the robustness of variations of models against various adversarial attacks. By conducting our experiments, we aim to shed light on the critical issue of maintaining the reliability and safety of deep learning models in safety- and security-critical applications. Our results indicate the pressing demand for an in-depth assessment of the effects of model changes on the robustness of models.

[208]  arXiv:2405.01935 [pdf, ps, other]
Title: Cut elimination for Cyclic Proofs: A Case Study in Temporal Logic
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

We consider modal logic extended with the well-known temporal operator `eventually' and provide a cut-elimination procedure for a cyclic sequent calculus that captures this fragment. The work showcases an adaptation of the reductive cut-elimination method to cyclic calculi. Notably, the proposed algorithm applies to a cyclic proof and directly outputs a cyclic cut-free proof without appealing to intermediate machinery for regularising the end proof.

[209]  arXiv:2405.01937 [pdf, other]
Title: An Attention Based Pipeline for Identifying Pre-Cancer Lesions in Head and Neck Clinical Images
Comments: 5 pages, 3 figures, accepted in ISBI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Early detection of cancer can help improve patient prognosis by early intervention. Head and neck cancer is diagnosed in specialist centres after a surgical biopsy, however, there is a potential for these to be missed leading to delayed diagnosis. To overcome these challenges, we present an attention based pipeline that identifies suspected lesions, segments, and classifies them as non-dysplastic, dysplastic and cancerous lesions. We propose (a) a vision transformer based Mask R-CNN network for lesion detection and segmentation of clinical images, and (b) Multiple Instance Learning (MIL) based scheme for classification. Current results show that the segmentation model produces segmentation masks and bounding boxes with up to 82% overlap accuracy score on unseen external test data and surpassing reviewed segmentation benchmarks. Next, a classification F1-score of 85% on the internal cohort test set. An app has been developed to perform lesion segmentation taken via a smart device. Future work involves employing endoscopic video data for precise early detection and prognosis.

[210]  arXiv:2405.01938 [pdf, other]
Title: Conservative semi-lagrangian finite difference scheme for transport simulations using graph neural networks
Comments: arXiv admin note: text overlap with arXiv:2309.04943
Subjects: Numerical Analysis (math.NA)

Semi-Lagrangian (SL) schemes are highly efficient for simulating transport equations and are widely used across various applications. Despite their success, designing genuinely multi-dimensional and conservative SL schemes remains a significant challenge. Building on our previous work [Chen et al., J. Comput. Phys., V490 112329, (2023)], we introduce a conservative machine-learning-based SL finite difference (FD) method that allows for extra-large time step evolution. At the core of our approach is a novel dynamical graph neural network designed to handle the complexities associated with tracking accurately upstream points along characteristics. This proposed neural transport solver learns the conservative SL FD discretization directly from data, improving accuracy and efficiency compared to traditional numerical schemes, while significantly simplifying algorithm implementation. We validate the method' s effectiveness and efficiency through numerical tests on benchmark transport equations in both one and two dimensions, as well as the nonlinear Vlasov-Poisson system.

[211]  arXiv:2405.01940 [pdf, other]
Title: On the Relative Completeness of Satisfaction-based Quantum Hoare Logic
Comments: 35 pages
Subjects: Logic in Computer Science (cs.LO)

Quantum Hoare logic (QHL) is a formal verification tool specifically designed to ensure the correctness of quantum programs. There has been an ongoing challenge to achieve a relatively complete satisfaction-based QHL with while-loop since its inception in 2006. This paper presents a solution by proposing the first relatively complete satisfaction-based QHL with while-loop. The completeness is proved in two steps. First, we establish a semantics and proof system of Hoare triples with quantum programs and deterministic assertions. Then, by utilizing the weakest precondition of deterministic assertion, we construct the weakest preterm calculus of probabilistic expressions. The relative completeness of QHL is then obtained as a consequence of the weakest preterm calculus. Using our QHL, we formally verify the correctness of Deutsch's algorithm and quantum teleportation.

[212]  arXiv:2405.01942 [pdf, other]
Title: CRCL at SemEval-2024 Task 2: Simple prompt optimizations
Journal-ref: SemEval-2024
Subjects: Computation and Language (cs.CL)

We present a baseline for the SemEval 2024 task 2 challenge, whose objective is to ascertain the inference relationship between pairs of clinical trial report sections and statements. We apply prompt optimization techniques with LLM Instruct models provided as a Language Model-as-a-Service (LMaaS). We observed, in line with recent findings, that synthetic CoT prompts significantly enhance manually crafted ones.

[213]  arXiv:2405.01943 [pdf, other]
Title: Dependency-Aware Semi-Structured Sparsity of GLU Variants in Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The rapid advancement in Large Language Models (LLMs) has markedly enhanced the capabilities of language understanding and generation. However, the substantial model size poses hardware challenges, affecting both memory size for serving and inference latency for token generation. To address those challenges, we propose Dependency-aware Semi-structured Sparsity (DaSS), a novel method for the recent prevalent SwiGLU-based LLMs pruning. Our approach incorporates structural dependency into the weight magnitude-based unstructured pruning. We introduce an MLP-specific pruning metric that evaluates the importance of each weight by jointly considering its magnitude and its corresponding MLP intermediate activation norms. DaSS facilitates a balance between the adaptability offered by unstructured pruning and the structural consistency inherent in dependency-based structured pruning. Empirical evaluations on Mistral and LLaMA2 model families demonstrate that DaSS not only outperforms both SparseGPT and Wanda in achieving hardware-friendly N:M sparsity patterns but also maintains the computational efficiency of Wanda.

[214]  arXiv:2405.01948 [pdf, ps, other]
Title: Common Randomness Generation from Sources with Infinite Polish Alphabet
Comments: arXiv admin note: text overlap with arXiv:2210.04556
Subjects: Information Theory (cs.IT)

We investigate the problem of common randomness (CR) generation in the basic two-party communication setting in which a sender and a receiver aim to agree on a common random variable with high probability. The terminals observe independent and identically distributed (i.i.d.) samples of sources with an arbitrary distribution defined on a Polish alphabet and are allowed to communicate as little as possible over a noisy, memoryless channel. We establish single-letter upper and lower bounds on the CR capacity for the specified model. The derived bounds hold with equality except for at most countably many points where discontinuity issues might arise.

[215]  arXiv:2405.01963 [pdf, other]
Title: From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep Learning (DL) is rapidly maturing to the point that it can be used in safety- and security-crucial applications. However, adversarial samples, which are undetectable to the human eye, pose a serious threat that can cause the model to misbehave and compromise the performance of such applications. Addressing the robustness of DL models has become crucial to understanding and defending against adversarial attacks. In this study, we perform comprehensive experiments to examine the effect of adversarial attacks and defenses on various model architectures across well-known datasets. Our research focuses on black-box attacks such as SimBA, HopSkipJump, MGAAttack, and boundary attacks, as well as preprocessor-based defensive mechanisms, including bits squeezing, median smoothing, and JPEG filter. Experimenting with various models, our results demonstrate that the level of noise needed for the attack increases as the number of layers increases. Moreover, the attack success rate decreases as the number of layers increases. This indicates that model complexity and robustness have a significant relationship. Investigating the diversity and robustness relationship, our experiments with diverse models show that having a large number of parameters does not imply higher robustness. Our experiments extend to show the effects of the training dataset on model robustness. Using various datasets such as ImageNet-1000, CIFAR-100, and CIFAR-10 are used to evaluate the black-box attacks. Considering the multiple dimensions of our analysis, e.g., model complexity and training dataset, we examined the behavior of black-box attacks when models apply defenses. Our results show that applying defense strategies can significantly reduce attack effectiveness. This research provides in-depth analysis and insight into the robustness of DL models against various attacks, and defenses.

[216]  arXiv:2405.01971 [pdf, other]
Title: A Sonar-based AUV Positioning System for Underwater Environments with Low Infrastructure Density
Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024
Journal-ref: IEEE ICRA Workshop on Field Robotics 2024
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

The increasing demand for underwater vehicles highlights the necessity for robust localization solutions in inspection missions. In this work, we present a novel real-time sonar-based underwater global positioning algorithm for AUVs (Autonomous Underwater Vehicles) designed for environments with a sparse distribution of human-made assets. Our approach exploits two synergistic data interpretation frontends applied to the same stream of sonar data acquired by a multibeam Forward-Looking Sonar (FSD). These observations are fused within a Particle Filter (PF) either to weigh more particles that belong to high-likelihood regions or to solve symmetric ambiguities. Preliminary experiments carried out on a simulated environment resembling a real underwater plant provided promising results. This work represents a starting point towards future developments of the method and consequent exhaustive evaluations also in real-world scenarios.

[217]  arXiv:2405.01972 [pdf, other]
Title: A quantitative and typological study of Early Slavic participle clauses and their competition
Authors: Nilo Pedrazzini
Comments: 259 pages, 138 figures. DPhil Thesis in Linguistics submitted and defended at the University of Oxford (December 2023). This manuscript is a version formatted for improved readability and broader dissemination
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

This thesis is a corpus-based, quantitative, and typological analysis of the functions of Early Slavic participle constructions and their finite competitors ($jegda$-'when'-clauses). The first part leverages detailed linguistic annotation on Early Slavic corpora at the morphosyntactic, dependency, information-structural, and lexical levels to obtain indirect evidence for different potential functions of participle clauses and their main finite competitor and understand the roles of compositionality and default discourse reasoning as explanations for the distribution of participle constructions and $jegda$-clauses in the corpus. The second part uses massively parallel data to analyze typological variation in how languages express the semantic space of English $when$, whose scope encompasses that of Early Slavic participle constructions and $jegda$-clauses. Probabilistic semantic maps are generated and statistical methods (including Kriging, Gaussian Mixture Modelling, precision and recall analysis) are used to induce cross-linguistically salient dimensions from the parallel corpus and to study conceptual variation within the semantic space of the hypothetical concept WHEN.

[218]  arXiv:2405.01974 [pdf, other]
Title: Multitask Extension of Geometrically Aligned Transfer Encoder
Comments: 7 pages, 3 figures, 2 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transfer Encoder (GATE), to a multi-task setup. Thus, we connect multiple molecular tasks by aligning the curved coordinates onto locally flat coordinates, ensuring the flow of information from source tasks to support performance on target data.

[219]  arXiv:2405.01975 [pdf, other]
Title: Introducing a microstructure-embedded autoencoder approach for reconstructing high-resolution solution field from reduced parametric space
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

In this study, we develop a novel multi-fidelity deep learning approach that transforms low-fidelity solution maps into high-fidelity ones by incorporating parametric space information into a standard autoencoder architecture. It is shown that, due to the integration of parametric space data, this method requires significantly less training data to achieve effective performance in predicting high-fidelity solution from the low-fidelity one. In this study, our focus is on a 2D steady-state heat transfer analysis in highly heterogeneous materials microstructure, where the spatial distribution of heat conductivity coefficients for two distinct materials is condensed. Subsequently, the boundary value problem is solved on the coarsest grid using a pre-trained physics-informed neural operator network. Afterward, the calculated low-fidelity result is upscaled using the newly designed enhanced autoencoder. The novelty of the developed enhanced autoencoder lies in the concatenation of heat conductivity maps of different resolutions to the decoder segment in distinct steps. We then compare the outcomes of developed algorithm with the corresponding finite element results, standard U-Net architecture as well as other upscaling approaches such as interpolation functions of varying orders and feedforward neural networks (FFNN). The analysis of the results based on the new approach demonstrates superior performance compared to other approaches in terms of computational cost and error on the test cases. Therefore, as a potential supplement to neural operators networks, our architecture upscales low-fidelity solutions to high-fidelity ones while preserving critical details that are often lost in conventional upscaling methods, especially at sharp interfaces, such as those encountered with interpolation methods.

[220]  arXiv:2405.01976 [pdf, other]
Title: Conformal Prediction for Natural Language Processing: A Survey
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications. Conformal prediction is emerging as a theoretically sound and practically useful framework, combining flexibility with strong statistical guarantees. Its model-agnostic and distribution-free nature makes it particularly promising to address the current shortcomings of NLP systems that stem from the absence of uncertainty quantification. This paper provides a comprehensive survey of conformal prediction techniques, their guarantees, and existing applications in NLP, pointing to directions for future research and open challenges.

[221]  arXiv:2405.01978 [pdf, other]
Title: Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications
Authors: Vegard Flovik
Comments: Working paper
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Distribution shifts, where statistical properties differ between training and test datasets, present a significant challenge in real-world machine learning applications where they directly impact model generalization and robustness. In this study, we explore model adaptation and generalization by utilizing synthetic data to systematically address distributional disparities. Our investigation aims to identify the prerequisites for successful model adaptation across diverse data distributions, while quantifying the associated uncertainties. Specifically, we generate synthetic data using the Van der Waals equation for gases and employ quantitative measures such as Kullback-Leibler divergence, Jensen-Shannon distance, and Mahalanobis distance to assess data similarity. These metrics en able us to evaluate both model accuracy and quantify the associated uncertainty in predictions arising from data distribution shifts. Our findings suggest that utilizing statistical measures, such as the Mahalanobis distance, to determine whether model predictions fall within the low-error "interpolation regime" or the high-error "extrapolation regime" provides a complementary method for assessing distribution shift and model uncertainty. These insights hold significant value for enhancing model robustness and generalization, essential for the successful deployment of machine learning applications in real-world scenarios.

[222]  arXiv:2405.01979 [pdf, other]
Title: Graph Neural Network based Active and Passive Beamforming for Distributed STAR-RIS-Assisted Multi-User MISO Systems
Comments: 13 pages, 7 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper investigates a joint active and passive beamforming design for distributed simultaneous transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS) assisted multi-user (MU)- mutiple input single output (MISO) systems, where the energy splitting (ES) mode is considered for the STAR-RIS. We aim to design the active beamforming vectors at the base station (BS) and the passive beamforming at the STAR-RIS to maximize the user sum rate under transmitting power constraints. The formulated problem is non-convex and nontrivial to obtain the global optimum due to the coupling between active beamforming vectors and STAR-RIS phase shifts. To efficiently solve the problem, we propose a novel graph neural network (GNN)-based framework. Specifically, we first model the interactions among users and network entities are using a heterogeneous graph representation. A heterogeneous graph neural network (HGNN) implementation is then introduced to directly optimizes beamforming vectors and STAR-RIS coefficients with the system objective. Numerical results show that the proposed approach yields efficient performance compared to the previous benchmarks. Furthermore, the proposed GNN is scalable with various system configurations.

[223]  arXiv:2405.01983 [pdf, other]
Title: Model-based reinforcement learning for protein backbone design
Subjects: Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)

Designing protein nanomaterials of predefined shape and characteristics has the potential to dramatically impact the medical industry. Machine learning (ML) has proven successful in protein design, reducing the need for expensive wet lab experiment rounds. However, challenges persist in efficiently exploring the protein fitness landscapes to identify optimal protein designs. In response, we propose the use of AlphaZero to generate protein backbones, meeting shape and structural scoring requirements. We extend an existing Monte Carlo tree search (MCTS) framework by incorporating a novel threshold-based reward and secondary objectives to improve design precision. This innovation considerably outperforms existing approaches, leading to protein backbones that better respect structural scores. The application of AlphaZero is novel in the context of protein backbone design and demonstrates promising performance. AlphaZero consistently surpasses baseline MCTS by more than 100% in top-down protein design tasks. Additionally, our application of AlphaZero with secondary objectives uncovers further promising outcomes, indicating the potential of model-based reinforcement learning (RL) in navigating the intricate and nuanced aspects of protein design

[224]  arXiv:2405.01988 [pdf, other]
Title: Joint sentiment analysis of lyrics and audio in music
Comments: published at DAGA 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in emotion taxonomies.

[225]  arXiv:2405.01990 [pdf, other]
Title: Soft Label PU Learning
Subjects: Machine Learning (cs.LG)

PU learning refers to the classification problem in which only part of positive samples are labeled. Existing PU learning methods treat unlabeled samples equally. However, in many real tasks, from common sense or domain knowledge, some unlabeled samples are more likely to be positive than others. In this paper, we propose soft label PU learning, in which unlabeled data are assigned soft labels according to their probabilities of being positive. Considering that the ground truth of TPR, FPR, and AUC are unknown, we then design PU counterparts of these metrics to evaluate the performances of soft label PU learning methods within validation data. We show that these new designed PU metrics are good substitutes for the real metrics. After that, a method that optimizes such metrics is proposed. Experiments on public datasets and real datasets for anti-cheat services from Tencent games demonstrate the effectiveness of our proposed method.

[226]  arXiv:2405.01992 [pdf, other]
Title: SFFNet: A Wavelet-Based Spatial and Frequency Domain Fusion Network for Remote Sensing Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In order to fully utilize spatial information for segmentation and address the challenge of handling areas with significant grayscale variations in remote sensing segmentation, we propose the SFFNet (Spatial and Frequency Domain Fusion Network) framework. This framework employs a two-stage network design: the first stage extracts features using spatial methods to obtain features with sufficient spatial details and semantic information; the second stage maps these features in both spatial and frequency domains. In the frequency domain mapping, we introduce the Wavelet Transform Feature Decomposer (WTFD) structure, which decomposes features into low-frequency and high-frequency components using the Haar wavelet transform and integrates them with spatial features. To bridge the semantic gap between frequency and spatial features, and facilitate significant feature selection to promote the combination of features from different representation domains, we design the Multiscale Dual-Representation Alignment Filter (MDAF). This structure utilizes multiscale convolutions and dual-cross attentions. Comprehensive experimental results demonstrate that, compared to existing methods, SFFNet achieves superior performance in terms of mIoU, reaching 84.80% and 87.73% respectively.The code is located at https://github.com/yysdck/SFFNet.

[227]  arXiv:2405.01995 [pdf, other]
Title: Cooperation and Federation in Distributed Radar Point Cloud Processing
Journal-ref: 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

The paper considers the problem of human-scale RF sensing utilizing a network of resource-constrained MIMO radars with low range-azimuth resolution. The radars operate in the mmWave band and obtain time-varying 3D point cloud (PC) information that is sensitive to body movements. They also observe the same scene from different views and cooperate while sensing the environment using a sidelink communication channel. Conventional cooperation setups allow the radars to mutually exchange raw PC information to improve ego sensing. The paper proposes a federation mechanism where the radars exchange the parameters of a Bayesian posterior measure of the observed PCs, rather than raw data. The radars act as distributed parameter servers to reconstruct a global posterior (i.e., federated posterior) using Bayesian tools. The paper quantifies and compares the benefits of radar federation with respect to cooperation mechanisms. Both approaches are validated by experiments with a real-time demonstration platform. Federation makes minimal use of the sidelink communication channel (20 {\div} 25 times lower bandwidth use) and is less sensitive to unresolved targets. On the other hand, cooperation reduces the mean absolute target estimation error of about 20%.

[228]  arXiv:2405.01997 [pdf, ps, other]
Title: Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) are deep learning models designed to generate text based on textual input. Although researchers have been developing these models for more complex tasks such as code generation and general reasoning, few efforts have explored how LLMs can be applied to combinatorial problems. In this research, we investigate the potential of LLMs to solve the Travelling Salesman Problem (TSP). Utilizing GPT-3.5 Turbo, we conducted experiments employing various approaches, including zero-shot in-context learning, few-shot in-context learning, and chain-of-thoughts (CoT). Consequently, we fine-tuned GPT-3.5 Turbo to solve a specific problem size and tested it using a set of various instance sizes. The fine-tuned models demonstrated promising performance on problems identical in size to the training instances and generalized well to larger problems. Furthermore, to improve the performance of the fine-tuned model without incurring additional training costs, we adopted a self-ensemble approach to improve the quality of the solutions.

[229]  arXiv:2405.01999 [pdf, other]
Title: Semi-Automatic Infrared Calibration for Augmented Reality Systems in Surgery
Comments: Published in conference proceedings for 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). For associated code visit: this https URL
Subjects: Robotics (cs.RO)

Augmented reality (AR) has the potential to improve the immersion and efficiency of computer-assisted orthopaedic surgery (CAOS) by allowing surgeons to maintain focus on the operating site rather than external displays in the operating theatre. Successful deployment of AR to CAOS requires a calibration that can accurately calculate the spatial relationship between real and holographic objects. Several studies attempt this calibration through manual alignment or with additional fiducial markers in the surgical scene. We propose a calibration system that offers a direct method for the calibration of AR head-mounted displays (HMDs) with CAOS systems, by using infrared-reflective marker-arrays widely used in CAOS. In our fast, user-agnostic setup, a HoloLens 2 detected the pose of marker arrays using infrared response and time-of-flight depth obtained through sensors onboard the HMD. Registration with a commercially available CAOS system was achieved when an IR marker-array was visible to both devices. Study tests found relative-tracking mean errors of 2.03 mm and 1.12{\deg} when calculating the relative pose between two static marker-arrays at short ranges. When using the calibration result to provide in-situ holographic guidance for a simulated wire-insertion task, a pre-clinical test reported mean errors of 2.07 mm and 1.54{\deg} when compared to a pre-planned trajectory.

[230]  arXiv:2405.02002 [pdf, ps, other]
Title: Optimizing Robot Dispersion on Grids: with and without Fault Tolerance
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The introduction and study of dispersing mobile robots across the nodes of an anonymous graph have recently gained traction and have been explored within various graph classes and settings. While optimal dispersion solution was established for {\em oriented} grids [Kshemkalyani et al., WALCOM 2020], a significant unresolved question pertains to whether achieving optimal dispersion is feasible on an {\em unoriented} grid. This paper investigates the dispersion problem on unoriented grids, considering both non-faulty and faulty robots. The challenge posed by unoriented grids lies in the absence of a clear sense of direction for a single robot moving between nodes, as opposed to the straightforward navigation of oriented grids.
We present three deterministic algorithms tailored to our robot model. The first and second algorithms deal with the dispersion of faulty and non-faulty robots, ensuring both time and memory optimization in oriented and unoriented grids, respectively. Faulty robots that are prone to crashing at any time, causing permanent failure. In both settings, we achieve dispersion in $O(\sqrt{n})$ rounds while requiring $O(\log n)$ bits of memory per robot. The third algorithm tackles faulty robots prone to crash faults in an unoriented grid. In this scenario, our algorithm operates within $O(\sqrt{n} \log n)$ time and uses $O(\sqrt{n} \log n)$ bits of memory per robot. The robots need to know the value of $n$ for termination.

[231]  arXiv:2405.02004 [pdf, other]
Title: M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel self-supervised two-frame multi-camera metric depth estimation network, termed M${^2}$Depth, which is designed to predict reliable scale-aware surrounding depth in autonomous driving. Unlike the previous works that use multi-view images from a single time-step or multiple time-step images from a single camera, M${^2}$Depth takes temporally adjacent two-frame images from multiple cameras as inputs and produces high-quality surrounding depth. We first construct cost volumes in spatial and temporal domains individually and propose a spatial-temporal fusion module that integrates the spatial-temporal information to yield a strong volume presentation. We additionally combine the neural prior from SAM features with internal features to reduce the ambiguity between foreground and background and strengthen the depth edges. Extensive experimental results on nuScenes and DDAD benchmarks show M${^2}$Depth achieves state-of-the-art performance. More results can be found in https://heiheishuang.xyz/M2Depth .

[232]  arXiv:2405.02005 [pdf, other]
Title: HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2
Comments: 8 pages, 9 figures, 2 tables. Will be published in the ISPRS The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the fields of photogrammetry, computer vision and computer graphics, the task of neural 3D scene reconstruction has led to the exploration of various techniques. Among these, 3D Gaussian Splatting stands out for its explicit representation of scenes using 3D Gaussians, making it appealing for tasks like 3D point cloud extraction and surface reconstruction. Motivated by its potential, we address the domain of 3D scene reconstruction, aiming to leverage the capabilities of the Microsoft HoloLens 2 for instant 3D Gaussian Splatting. We present HoloGS, a novel workflow utilizing HoloLens sensor data, which bypasses the need for pre-processing steps like Structure from Motion by instantly accessing the required input data i.e. the images, camera poses and the point cloud from depth sensing. We provide comprehensive investigations, including the training process and the rendering quality, assessed through the Peak Signal-to-Noise Ratio, and the geometric 3D accuracy of the densified point cloud from Gaussian centers, measured by Chamfer Distance. We evaluate our approach on two self-captured scenes: An outdoor scene of a cultural heritage statue and an indoor scene of a fine-structured plant. Our results show that the HoloLens data, including RGB images, corresponding camera poses, and depth sensing based point clouds to initialize the Gaussians, are suitable as input for 3D Gaussian Splatting.

[233]  arXiv:2405.02008 [pdf, other]
Title: DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited utilization of structured priors inherent in map segmentation masks. In light of this, we propose DiffMap, a novel approach specifically designed to model the structured priors of map segmentation masks using latent diffusion model. By incorporating this technique, the performance of existing semantic segmentation methods can be significantly enhanced and certain structural errors present in the segmentation outputs can be effectively rectified. Notably, the proposed module can be seamlessly integrated into any map segmentation model, thereby augmenting its capability to accurately delineate semantic information. Furthermore, through extensive visualization analysis, our model demonstrates superior proficiency in generating results that more accurately reflect real-world map layouts, further validating its efficacy in improving the quality of the generated maps.

[234]  arXiv:2405.02010 [pdf, other]
Title: The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification
Comments: Accepted to the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP) at NAACL 2024
Subjects: Computation and Language (cs.CL)

Current natural language processing (NLP) research tends to focus on only one or, less frequently, two dimensions - e.g., performance, privacy, fairness, or efficiency - at a time, which may lead to suboptimal conclusions and often overlooking the broader goal of achieving trustworthy NLP. Work on adapter modules (Houlsby et al., 2019; Hu et al., 2021) focuses on improving performance and efficiency, with no investigation of unintended consequences on other aspects such as fairness. To address this gap, we conduct experiments on three text classification datasets by either (1) finetuning all parameters or (2) using adapter modules. Regarding performance and efficiency, we confirm prior findings that the accuracy of adapter-enhanced models is roughly on par with that of fully finetuned models, while training time is substantially reduced. Regarding fairness, we show that adapter modules result in mixed fairness across sensitive groups. Further investigation reveals that, when the standard fine-tuned model exhibits limited biases, adapter modules typically do not introduce extra bias. On the other hand, when the finetuned model exhibits increased bias, the impact of adapter modules on bias becomes more unpredictable, introducing the risk of significantly magnifying these biases for certain groups. Our findings highlight the need for a case-by-case evaluation rather than a one-size-fits-all judgment.

[235]  arXiv:2405.02011 [pdf, other]
Title: Autonomous Active Mapping in Steep Alpine Environments with Fixed-wing Aerial Vehicles
Comments: 8 pages, 8 figures, Accepted to the IEEE ICRA Workshop on Field Robotics 2024
Subjects: Robotics (cs.RO)

Monitoring large scale environments is a crucial task for managing remote alpine environments, especially for hazardous events such as avalanches. One key information for avalanche risk forecast is imagery of released avalanches. As these happen in remote and potentially dangerous locations this data is difficult to obtain. Fixed-wing vehicles, due to their long range and travel speeds are a promising platform to gather aerial imagery to map avalanche activities. However, operating such vehicles in mountainous terrain remains a challenge due to the complex topography, regulations, and uncertain environment. In this work, we present a system that is capable of safely navigating and mapping an avalanche using a fixed-wing aerial system and discuss the challenges arising when executing such a mission. We show in our field experiments that we can effectively navigate in steep terrain environments while maximizing the map quality. We expect our work to enable more autonomous operations of fixed-wing vehicles in alpine environments to maximize the quality of the data gathered.

[236]  arXiv:2405.02016 [pdf, other]
Title: Adversarial Botometer: Adversarial Analysis for Social Bot Detection
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Social bots play a significant role in many online social networks (OSN) as they imitate human behavior. This fact raises difficult questions about their capabilities and potential risks. Given the recent advances in Generative AI (GenAI), social bots are capable of producing highly realistic and complex content that mimics human creativity. As the malicious social bots emerge to deceive people with their unrealistic content, identifying them and distinguishing the content they produce has become an actual challenge for numerous social platforms. Several approaches to this problem have already been proposed in the literature, but the proposed solutions have not been widely evaluated. To address this issue, we evaluate the behavior of a text-based bot detector in a competitive environment where some scenarios are proposed: \textit{First}, the tug-of-war between a bot and a bot detector is examined. It is interesting to analyze which party is more likely to prevail and which circumstances influence these expectations. In this regard, we model the problem as a synthetic adversarial game in which a conversational bot and a bot detector are engaged in strategic online interactions. \textit{Second}, the bot detection model is evaluated under attack examples generated by a social bot; to this end, we poison the dataset with attack examples and evaluate the model performance under this condition. \textit{Finally}, to investigate the impact of the dataset, a cross-domain analysis is performed. Through our comprehensive evaluation of different categories of social bots using two benchmark datasets, we were able to demonstrate some achivement that could be utilized in future works.

[237]  arXiv:2405.02019 [pdf, other]
Title: Fast Algorithms for Spiking Neural Network Simulation with FPGAs
Comments: 34 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Hardware Architecture (cs.AR); Performance (cs.PF)

Using OpenCL-based high-level synthesis, we create a number of spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA). Our best simulators simulate the circuit 25\% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory. Speed-wise they compare favorably to the state-of-the-art GPU-based simulators and their energy usage is lower than any other published result. This result is the first for simulating the circuit on a single hardware accelerator. We also extensively analyze the techniques and algorithms we implement our simulators with, many of which can be realized on other types of hardware. Thus, this article is of interest to any researcher or practitioner interested in efficient SNN simulation, whether they target FPGAs or not.

[238]  arXiv:2405.02022 [pdf, other]
Title: STX-Vote: Improving Reliability with Bit Voting in Synchronous Transmission-based IoT Networks
Subjects: Networking and Internet Architecture (cs.NI)

Industrial Internet of Things (IIoT) networks must meet strict reliability, latency, and low energy consumption requirements. However, traditional low-power wireless protocols are ineffective in finding a sweet spot for balancing these performance metrics. Recently, network flooding protocols based on Synchronous Transmissions (STX) have been proposed for better performance in reliability-critical IIoT, where simultaneous transmissions are possible without packet collisions. STX-based protocols can offer a competitive edge over routing-based protocols, particularly in dependability. However, they notably suffer from the beating effect, a physical layer phenomenon that results in sinusoidal interference across a packet and, consequently, packet loss. Thus, we introduce STX-Vote, an error correction scheme that can handle errors caused by beating effects. Importantly, we utilize transmission redundancy already inherent within STX protocols so do not incur additional on-air overhead. Through simulation, we demonstrate STX-Vote can provide a 40% increase in reliability. We subsequently implement STX-Vote on nRF52840-DK devices and perform extensive experiments. The results confirm that STX-Vote improves reliability by 25-28% for BLE 5 PHYs and 8% for IEEE 802.15.4; thus, it can complement existing error correction schemes.

[239]  arXiv:2405.02023 [pdf, other]
Title: IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advancements have showcased the potential of handheld millimeter-wave (mmWave) imaging, which applies synthetic aperture radar (SAR) principles in portable settings. However, existing studies addressing handheld motion errors either rely on costly tracking devices or employ simplified imaging models, leading to impractical deployment or limited performance. In this paper, we present IFNet, a novel deep unfolding network that combines the strengths of signal processing models and deep neural networks to achieve robust imaging and focusing for handheld mmWave systems. We first formulate the handheld imaging model by integrating multiple priors about mmWave images and handheld phase errors. Furthermore, we transform the optimization processes into an iterative network structure for improved and efficient imaging performance. Extensive experiments demonstrate that IFNet effectively compensates for handheld phase errors and recovers high-fidelity images from severely distorted signals. In comparison with existing methods, IFNet can achieve at least 11.89 dB improvement in average peak signal-to-noise ratio (PSNR) and 64.91% improvement in average structural similarity index measure (SSIM) on a real-world dataset.

[240]  arXiv:2405.02024 [pdf, other]
Title: Analyzing Narrative Processing in Large Language Models (LLMs): Using GPT4 to test BERT
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The ability to transmit and receive complex information via language is unique to humans and is the basis of traditions, culture and versatile social interactions. Through the disruptive introduction of transformer based large language models (LLMs) humans are not the only entity to "understand" and produce language any more. In the present study, we have performed the first steps to use LLMs as a model to understand fundamental mechanisms of language processing in neural networks, in order to make predictions and generate hypotheses on how the human brain does language processing. Thus, we have used ChatGPT to generate seven different stylistic variations of ten different narratives (Aesop's fables). We used these stories as input for the open source LLM BERT and have analyzed the activation patterns of the hidden units of BERT using multi-dimensional scaling and cluster analysis. We found that the activation vectors of the hidden units cluster according to stylistic variations in earlier layers of BERT (1) than narrative content (4-5). Despite the fact that BERT consists of 12 identical building blocks that are stacked and trained on large text corpora, the different layers perform different tasks. This is a very useful model of the human brain, where self-similar structures, i.e. different areas of the cerebral cortex, can have different functions and are therefore well suited to processing language in a very efficient way. The proposed approach has the potential to open the black box of LLMs on the one hand, and might be a further step to unravel the neural processes underlying human language processing and cognition in general.

[241]  arXiv:2405.02026 [pdf, other]
Title: Diversity of What? On the Different Conceptualizations of Diversity in Recommender Systems
Journal-ref: FAccT 2024
Subjects: Information Retrieval (cs.IR)

Diversity is a commonly known principle in the design of recommender systems, but also ambiguous in its conceptualization. Through semi-structured interviews we explore how practitioners at three different public service media organizations in the Netherlands conceptualize diversity within the scope of their recommender systems. We provide an overview of the goals that they have with diversity in their systems, which aspects are relevant, and how recommendations should be diversified. We show that even within this limited domain, conceptualization of diversity greatly varies, and argue that it is unlikely that a standardized conceptualization will be achieved. Instead, we should focus on effective communication of what diversity in this particular system means, thus allowing for operationalizations of diversity that are capable of expressing the nuances and requirements of that particular domain.

[242]  arXiv:2405.02029 [pdf, other]
Title: MemorAI: Energy-Efficient Last-Level Cache Memory Optimization for Virtualized RANs
Subjects: Networking and Internet Architecture (cs.NI)

The virtualization of Radio Access Networks (vRAN) is well on its way to become a reality, driven by its advantages such as flexibility and cost-effectiveness. However, virtualization comes at a high price - virtual Base Stations (vBSs) sharing the same computing platform incur a significant computing overhead due to in extremis consumption of shared cache memory resources. Consequently, vRAN suffers from increased energy consumption, which fuels the already high operational costs in 5G networks. This paper investigates cache memory allocation mechanisms' effectiveness in reducing total energy consumption. Using an experimental vRAN platform, we profile the energy consumption and CPU utilization of vBS as a function of the network state (e.g., traffic demand, modulation scheme). Then, we address the high dimensionality of the problem by decomposing it per vBS, which is possible thanks to the Last-Level Cache (LLC) isolation implemented in our system. Based on this, we train a vBS digital twin, which allows us to train offline a classifier, avoiding the performance degradation of the system during training. Our results show that our approach performs very closely to an offline optimal oracle, outperforming standard approaches used in today's deployments.

[243]  arXiv:2405.02030 [pdf, other]
Title: Obstacle Avoidance of Autonomous Vehicles: An LPVMPC with Scheduling Trust Region
Subjects: Systems and Control (eess.SY)

Reference tracking and obstacle avoidance rank among the foremost challenging aspects of autonomous driving. This paper proposes control designs for solving reference tracking problems in autonomous driving tasks while considering static obstacles. We suggest a model predictive control (MPC) strategy that evades the computational burden of nonlinear nonconvex optimization methods after embedding the nonlinear model equivalently to a linear parameter-varying (LPV) formulation using the so-called scheduling parameter. This allows optimal and fast solutions of the underlying convex optimization scheme as a quadratic program (QP) at the expense of losing some performance due to the uncertainty of the future scheduling trajectory over the MPC horizon. Also, to ensure that the modeling error due to the application of the scheduling parameter predictions does not become significant, we propose the concept of scheduling trust region by enforcing further soft constraints on the states and inputs. A consequence of using the new constraints in the MPC is that we construct a region in which the scheduling parameter updates in two consecutive time instants are trusted for computing the system matrices, and therefore, the feasibility of the MPC optimization problem is retained. We test the method in different scenarios and compare the results to standard LPVMPC as well as nonlinear MPC (NMPC) schemes.

[244]  arXiv:2405.02040 [pdf, ps, other]
Title: Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance
Comments: 19 pages, 6 figures
Subjects: Computation and Language (cs.CL)

Pathology reports are rich in clinical and pathological details but are often presented in free-text format. The unstructured nature of these reports presents a significant challenge limiting the accessibility of their content. In this work, we present a practical approach based on the use of large multimodal models (LMMs) for automatically extracting information from scanned images of pathology reports with the goal of generating a standardised report specifying the value of different fields along with estimated confidence about the accuracy of the extracted fields. The proposed approach overcomes limitations of existing methods which do not assign confidence scores to extracted fields limiting their practical use. The proposed framework uses two stages of prompting a Large Multimodal Model (LMM) for information extraction and validation. The framework generalises to textual reports from multiple medical centres as well as scanned images of legacy pathology reports. We show that the estimated confidence is an effective indicator of the accuracy of the extracted information that can be used to select only accurately extracted fields. We also show the prognostic significance of structured and unstructured data from pathology reports and show that the automatically extracted field values significant prognostic value for patient stratification. The framework is available for evaluation via the URL: https://labieb.dcs.warwick.ac.uk/.

[245]  arXiv:2405.02041 [pdf, other]
Title: Stabilizing Backpropagation Through Time to Learn Complex Physics
Comments: Published at ICLR 2024, code available at this https URL
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

Of all the vector fields surrounding the minima of recurrent learning setups, the gradient field with its exploding and vanishing updates appears a poor choice for optimization, offering little beyond efficient computability. We seek to improve this suboptimal practice in the context of physics simulations, where backpropagating feedback through many unrolled time steps is considered crucial to acquiring temporally coherent behavior. The alternative vector field we propose follows from two principles: physics simulators, unlike neural networks, have a balanced gradient flow, and certain modifications to the backpropagation pass leave the positions of the original minima unchanged. As any modification of backpropagation decouples forward and backward pass, the rotation-free character of the gradient field is lost. Therefore, we discuss the negative implications of using such a rotational vector field for optimization and how to counteract them. Our final procedure is easily implementable via a sequence of gradient stopping and component-wise comparison operations, which do not negatively affect scalability. Our experiments on three control problems show that especially as we increase the complexity of each task, the unbalanced updates from the gradient can no longer provide the precise control signals necessary while our method still solves the tasks. Our code can be found at https://github.com/tum-pbs/StableBPTT.

[246]  arXiv:2405.02042 [pdf, other]
Title: Sampling to Achieve the Goal: An Age-aware Remote Markov Decision Process
Comments: 12 pages, 4 figures
Subjects: Information Theory (cs.IT)

Age of Information (AoI) has been recognized as an important metric to measure the freshness of information. Central to this consensus is that minimizing AoI can enhance the freshness of information, thereby facilitating the accuracy of subsequent decision-making processes. However, to date the direct causal relationship that links AoI to the utility of the decision-making process is unexplored. To fill this gap, this paper provides a sampling-control co-design problem, referred to as an age-aware remote Markov Decision Process (MDP) problem, to explore this unexplored relationship. Our framework revisits the sampling problem in [1] with a refined focus: moving from AoI penalty minimization to directly optimizing goal-oriented remote decision-making process under random delay. We derive that the age-aware remote MDP problem can be reduced to a standard MDP problem without delays, and reveal that treating AoI solely as a metric for optimization is not optimal in achieving remote decision making. Instead, AoI can serve as important side information to facilitate remote decision making.

[247]  arXiv:2405.02043 [pdf, ps, other]
Title: On human-centred security: A new systems model based on modes and mode transitions
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

We propose an abstract conceptual framework for analysing complex security systems using a new notion of modes and mode transitions. A mode is an independent component of a system with its own objectives, monitoring data, algorithms, and scope and limits. The behaviour of a mode, including its transitions to other modes, is determined by interpretations of the mode's monitoring data in the light of its objectives and capabilities -- these interpretations we call beliefs. We formalise the conceptual framework mathematically and, by quantifying and visualising beliefs in higher-dimensional geometric spaces, we argue our models may help both design, analyse and explain systems. The mathematical models are based on simplicial complexes.

[248]  arXiv:2405.02044 [pdf, other]
Title: Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY); Optimization and Control (math.OC)

Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. Based on these results, we present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.

[249]  arXiv:2405.02045 [pdf, other]
Title: Are We in The Zone? Exploring The Features and Method of Detecting Simultaneous Flow Experiences Based on EEG Signals
Subjects: Human-Computer Interaction (cs.HC)

When executing interdependent personal tasks for the team's purpose, simultaneous individual flow(simultaneous flow) is the antecedent condition of achieving shared team flow. Detecting simultaneous flow helps better understanding the status of team members, which is thus important for optimizing multi-user interaction systems. However, there is currently a lack exploration on objective features and methods for detecting simultaneous flow. Based on brain mechanism of flow in teamwork and previous studies on electroencephalogram (EEG)-based individual flow detection, this study aims to explore the significant EEG features related to simultaneous flow, as well as effective detection methods based on EEG signals. First, a two-player simultaneous flow task is designed, based on which we construct the first multi-EEG signals dataset of simultaneous flow. Then, we explore the potential EEG signal features that may be related to individual and simultaneous flow and validate their effectiveness in simultaneous flow detection with various machine learning models. The results show that 1) the inter-brain synchrony features are relevant to simultaneous flow due to enhancing the models' performance in detecting different types of simultaneous flow; 2) the features from the frontal lobe area seem to be given priority attention when detecting simultaneous flows; 3) Random Forests performed best in binary classification while Neural Network and Deep Neural Network3 performed best in ternary classification.

[250]  arXiv:2405.02047 [pdf, other]
Title: Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs
Comments: Preprint, to appear at ARITH 2024 (this http URL) and IEEEXplore
Subjects: Hardware Architecture (cs.AR)

There is a recent trend in artificial intelligence (AI) inference towards lower precision data formats down to 8 bits and less. As multiplication is the most complex operation in typical inference tasks, there is a large demand for efficient small multipliers. The large DSP blocks have limitations implementing many small multipliers efficiently. Hence, this work proposes a solution for better logic-based multipliers that is especially beneficial for small multipliers. Our work is based on the multiplier tiling method in which a multiplier is designed out of several sub-multiplier tiles. The key observation we made is that these sub-multipliers do not necessarily have to perform a complete (rectangular) NxK multiplication and more efficient sub-multipliers are possible that are incomplete (non-rectangular). This proposal first seeks to identify efficient incomplete irregular sub-multipliers and then demonstrates improvements over state-of-the-art designs. It is shown that optimal solutions can be found using integer linear programming (ILP), which are evaluated in FPGA synthesis experiments.

[251]  arXiv:2405.02048 [pdf, ps, other]
Title: Comparative Analysis of Retrieval Systems in the Real World
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

This research paper presents a comprehensive analysis of integrating advanced language models with search and retrieval systems in the fields of information retrieval and natural language processing. The objective is to evaluate and compare various state-of-the-art methods based on their performance in terms of accuracy and efficiency. The analysis explores different combinations of technologies, including Azure Cognitive Search Retriever with GPT-4, Pinecone's Canopy framework, Langchain with Pinecone and different language models (OpenAI, Cohere), LlamaIndex with Weaviate Vector Store's hybrid search, Google's RAG implementation on Cloud VertexAI-Search, Amazon SageMaker's RAG, and a novel approach called KG-FID Retrieval. The motivation for this analysis arises from the increasing demand for robust and responsive question-answering systems in various domains. The RobustQA metric is used to evaluate the performance of these systems under diverse paraphrasing of questions. The report aims to provide insights into the strengths and weaknesses of each method, facilitating informed decisions in the deployment and development of AI-driven search and retrieval systems.

[252]  arXiv:2405.02050 [pdf, other]
Title: Ah, that's the great puzzle: On the Quest of a Holistic Understanding of the Harms of Recommender Systems on Children
Comments: 7 pages, 2 figures, DCDW 2024
Journal-ref: Designing for Children's Digital Well-being: A Research, Policy and Practice Agenda (DCDW '24), co-located with ACM IDC 2024
Subjects: Information Retrieval (cs.IR)

Children come across various media items online, many of which are selected by recommender systems (RS) primarily designed for adults. The specific nature of the content selected by RS to display on online platforms used by children - although not necessarily targeting them as a user base - remains largely unknown. This raises questions about whether such content is appropriate given children's vulnerable stages of development and the potential risks to their well-being.
In this position paper, we reflect on the relationship between RS and children, emphasizing the possible adverse effects of the content this user group might be exposed to online. As a step towards fostering safer interactions for children in online environments, we advocate for researchers, practitioners, and policymakers to undertake a more comprehensive examination of the impact of RS on children - one focused on harms. This would result in a more holistic understanding that could inform the design and deployment of strategies that would better suit children's needs and preferences while actively mitigating the potential harm posed by RS; acknowledging that identifying and addressing these harms is complex and multifaceted.

[253]  arXiv:2405.02053 [pdf, other]
Title: Solving Sequential Manipulation Puzzles by Finding Easier Subproblems
Comments: Accepted to ICRA 2024
Subjects: Robotics (cs.RO)

We consider a set of challenging sequential manipulation puzzles, where an agent has to interact with multiple movable objects and navigate narrow passages. Such settings are notoriously difficult for Task-and-Motion Planners, as they require interdependent regrasps and solving hard motion planning problems. In this paper, we propose to search over sequences of easier pick-and-place subproblems, which can lead to the solution of the manipulation puzzle. Our method combines a heuristic-driven forward search of subproblems with an optimization-based Task-and-Motion Planning solver. To guide the search, we introduce heuristics to generate and prioritize useful subgoals. We evaluate our approach on various manually designed and automatically generated scenes, demonstrating the benefits of auxiliary subproblems in sequential manipulation planning.

[254]  arXiv:2405.02060 [pdf, other]
Title: Federated Learning for Tabular Data using TabNet: A Vehicular Use-Case
Comments: 7 pages, 9 figures, 1 table, ICCP Conference 2022
Subjects: Machine Learning (cs.LG)

In this paper, we show how Federated Learning (FL) can be applied to vehicular use-cases in which we seek to classify obstacles, irregularities and pavement types on roads. Our proposed framework utilizes FL and TabNet, a state-of-the-art neural network for tabular data. We are the first to demonstrate how TabNet can be integrated with FL. Moreover, we achieve a maximum test accuracy of 93.6%. Finally, we reason why FL is a suitable concept for this data set.

[255]  arXiv:2405.02061 [pdf, other]
Title: Towards general deep-learning-based tree instance segmentation models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The segmentation of individual trees from forest point clouds is a crucial task for downstream analyses such as carbon sequestration estimation. Recently, deep-learning-based methods have been proposed which show the potential of learning to segment trees. Since these methods are trained in a supervised way, the question arises how general models can be obtained that are applicable across a wide range of settings. So far, training has been mainly conducted with data from one specific laser scanning type and for specific types of forests. In this work, we train one segmentation model under various conditions, using seven diverse datasets found in literature, to gain insights into the generalization capabilities under domain-shift. Our results suggest that a generalization from coniferous dominated sparse point clouds to deciduous dominated high-resolution point clouds is possible. Conversely, qualitative evidence suggests that generalization from high-resolution to low-resolution point clouds is challenging. This emphasizes the need for forest point clouds with diverse data characteristics for model development. To enrich the available data basis, labeled trees from two previous works were propagated to the complete forest point cloud and are made publicly available at https://doi.org/10.25625/QUTUWU.

[256]  arXiv:2405.02062 [pdf, other]
Title: Dyna-Style Learning with A Macroscopic Model for Vehicle Platooning in Mixed-Autonomy Traffic
Subjects: Machine Learning (cs.LG)

Platooning of connected and autonomous vehicles (CAVs) plays a vital role in modernizing highways, ushering in enhanced efficiency and safety. This paper explores the significance of platooning in smart highways, employing a coupled partial differential equation (PDE) and ordinary differential equation (ODE) model to elucidate the complex interaction between bulk traffic flow and CAV platoons. Our study focuses on developing a Dyna-style planning and learning framework tailored for platoon control, with a specific goal of reducing fuel consumption. By harnessing the coupled PDE-ODE model, we improve data efficiency in Dyna-style learning through virtual experiences. Simulation results validate the effectiveness of our macroscopic model in modeling platoons within mixed-autonomy settings, demonstrating a notable $10.11\%$ reduction in vehicular fuel consumption compared to conventional approaches.

[257]  arXiv:2405.02063 [pdf, other]
Title: Few-sample Variational Inference of Bayesian Neural Networks with Arbitrary Nonlinearities
Authors: David J. Schodt
Subjects: Machine Learning (cs.LG)

Bayesian Neural Networks (BNNs) extend traditional neural networks to provide uncertainties associated with their outputs. On the forward pass through a BNN, predictions (and their uncertainties) are made either by Monte Carlo sampling network weights from the learned posterior or by analytically propagating statistical moments through the network. Though flexible, Monte Carlo sampling is computationally expensive and can be infeasible or impractical under resource constraints or for large networks. While moment propagation can ameliorate the computational costs of BNN inference, it can be difficult or impossible for networks with arbitrary nonlinearities, thereby restricting the possible set of network layers permitted with such a scheme. In this work, we demonstrate a simple yet effective approach for propagating statistical moments through arbitrary nonlinearities with only 3 deterministic samples, enabling few-sample variational inference of BNNs without restricting the set of network layers used. Furthermore, we leverage this approach to demonstrate a novel nonlinear activation function that we use to inject physics-informed prior information into output nodes of a BNN.

[258]  arXiv:2405.02066 [pdf, other]
Title: WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.

[259]  arXiv:2405.02067 [pdf, other]
Title: Histogram-Based Federated XGBoost using Minimal Variance Sampling for Federated Tabular Data
Comments: 6 figures, 5 tables, 8 pages, FLTA 2023 (together with FMEC 2023)
Subjects: Machine Learning (cs.LG)

Federated Learning (FL) has gained considerable traction, yet, for tabular data, FL has received less attention. Most FL research has focused on Neural Networks while Tree-Based Models (TBMs) such as XGBoost have historically performed better on tabular data. It has been shown that subsampling of training data when building trees can improve performance but it is an open problem whether such subsampling can improve performance in FL. In this paper, we evaluate a histogram-based federated XGBoost that uses Minimal Variance Sampling (MVS). We demonstrate the underlying algorithm and show that our model using MVS can improve performance in terms of accuracy and regression error in a federated setting. In our evaluation, our model using MVS performs better than uniform (random) sampling and no sampling at all. It achieves both outstanding local and global performance on a new set of federated tabular datasets. Federated XGBoost using MVS also outperforms centralized XGBoost in half of the studied cases.

[260]  arXiv:2405.02068 [pdf, other]
Title: Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection
Comments: The paper is under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution. However, it still remains a challenging issue to maintain these ideal assumptions in practice. In this paper, we propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND, which sequentially performs Anomaly Amplification and Normality Distillation to obtain robust feature discrepancy. In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder. With the exposure of synthetic anomalies, it amplifies anomalies via residual generation while maintaining the integrity of pre-trained model. It mainly comprises a Matching-guided Residual Gate and an Attribute-scaling Residual Generator, which can determine the residuals' proportion and characteristic, respectively. In the second normality distillation stage, we further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns. Comprehensive experiments on the MvTecAD, VisA, and MvTec3D-RGB datasets show that our method achieves state-of-the-art performance.

[261]  arXiv:2405.02070 [pdf, other]
Title: Strategies for Intrusion Monitoring in Cloud Services
Comments: 5 pages
Journal-ref: Proc of the 8th International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2017), Athens, Greece, February 2017, pp. 49-53, ISSN 2308-4294
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Effective activity and event monitoring is an essential aspect of digital forensic readiness. Techniques for capturing log and other event data are familiar from conventional networked hosts and transfer directly to the Cloud context. In both contexts, a major concern is the risk that monitoring systems may be targeted and impaired by intruders seeking to conceal their illicit presence and activities. We outline an approach to intrusion monitoring that aims (i)~to ensure the credibility of log data and (ii)~provide a means of data sharing that supports log reconstruction in the event that one or more logging systems is maliciously impaired.

[262]  arXiv:2405.02073 [pdf, other]
Title: Iterative Reconstruction Methods for Cosmological X-Ray Tomography
Comments: 22 pages; codes for this paper will be made available at this https URL once the revision process is complete
Subjects: Numerical Analysis (math.NA)

We consider the imaging of cosmic strings by using Cosmic Microwave Background (CMB) data. Mathematically, we study the inversion of an X-ray transform in Lorentzian geometry, called the light ray transform. The inverse problem is highly ill-posed, with additional complexities of being large-scale and dynamic, with unknown parameters that represent multidimensional objects. This presents significant computational challenges for the numerical reconstruction of images that have high spatial and temporal resolution. In this paper, we begin with a microlocal stability analysis for inverting the light ray transform using the Landweber iteration. Next, we discretize the spatiotemporal object and light ray transform and consider iterative computational methods for solving the resulting inverse problem. We provide a numerical investigation and comparison of some advanced iterative methods for regularization including Tikhonov and sparsity-promoting regularizers for various example scalar functions with conormal type singularities.

[263]  arXiv:2405.02074 [pdf, other]
Title: A Federated Learning Benchmark on Tabular Data: Comparing Tree-Based Models and Neural Networks
Comments: 8 pages, 6 figures, 6 tables, FMEC 2023 (best paper)
Subjects: Machine Learning (cs.LG)

Federated Learning (FL) has lately gained traction as it addresses how machine learning models train on distributed datasets. FL was designed for parametric models, namely Deep Neural Networks (DNNs).Thus, it has shown promise on image and text tasks. However, FL for tabular data has received little attention. Tree-Based Models (TBMs) have been considered to perform better on tabular data and they are starting to see FL integrations. In this study, we benchmark federated TBMs and DNNs for horizontal FL, with varying data partitions, on 10 well-known tabular datasets. Our novel benchmark results indicates that current federated boosted TBMs perform better than federated DNNs in different data partitions. Furthermore, a federated XGBoost outperforms all other models. Lastly, we find that federated TBMs perform better than federated parametric models, even when increasing the number of clients significantly.

[264]  arXiv:2405.02077 [pdf, other]
Title: MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent few-shot action recognition (FSAR) methods achieve promising performance by performing semantic matching on learned discriminative features. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, \etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocity Progressive-alignment (MVP-Shot) framework to progressively learn and align semantic-related action features at multi-velocity levels. Concretely, a Multi-Velocity Feature Alignment (MVFA) module is designed to measure the similarity between features from support and query videos with different velocity scales and then merge all similarity scores in a residual fashion. To avoid the multiple velocity features deviating from the underlying motion semantic, our proposed Progressive Semantic-Tailored Interaction (PSTI) module injects velocity-tailored text information into the video feature via feature interaction on channel and temporal domains at different velocities. The above two modules compensate for each other to predict query categories more accurately under the few-shot settings. Experimental results show our method outperforms current state-of-the-art methods on multiple standard few-shot benchmarks (i.e., HMDB51, UCF101, Kinetics, and SSv2-small).

[265]  arXiv:2405.02079 [pdf, other]
Title: Argumentative Large Language Models for Explainable and Contestable Decision-Making
Comments: 19 pages, 17 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The diversity of knowledge encoded in large language models (LLMs) and their ability to apply this knowledge zero-shot in a range of settings makes them a promising candidate for use in decision-making. However, they are currently limited by their inability to reliably provide outputs which are explainable and contestable. In this paper, we attempt to reconcile these strengths and weaknesses by introducing a method for supplementing LLMs with argumentative reasoning. Concretely, we introduce argumentative LLMs, a method utilising LLMs to construct argumentation frameworks, which then serve as the basis for formal reasoning in decision-making. The interpretable nature of these argumentation frameworks and formal reasoning means that any decision made by the supplemented LLM may be naturally explained to, and contested by, humans. We demonstrate the effectiveness of argumentative LLMs experimentally in the decision-making task of claim verification. We obtain results that are competitive with, and in some cases surpass, comparable state-of-the-art techniques.

[266]  arXiv:2405.02080 [pdf, ps, other]
Title: Coding for Synthesis Defects
Subjects: Information Theory (cs.IT)

Motivated by DNA based data storage system, we investigate the errors that occur when synthesizing DNA strands in parallel, where each strand is appended one nucleotide at a time by the machine according to a template supersequence. If there is a cycle such that the machine fails, then the strands meant to be appended at this cycle will not be appended, and we refer to this as a synthesis defect. In this paper, we present two families of codes correcting synthesis defects, which are t-known-synthesis-defect correcting codes and t-synthesis-defect correcting codes. For the first one, it is assumed that the defective cycles are known, and each of the codeword is a quaternary sequence. We provide constructions for this family of codes for t = 1, 2, with redundancy log 4 and log n+18 log 3, respectively. For the second one, the codeword is a set of M ordered sequences, and we give constructions for t = 1, 2 to show a strategy for constructing this family of codes. Finally, we derive a lower bound on the redundancy for single-known-synthesis-defect correcting codes, which assures that our construction is almost optimal.

[267]  arXiv:2405.02081 [pdf, other]
Title: A Mutual Information Perspective on Federated Contrastive Learning
Comments: Published as a conference paper at ICLR 2024
Subjects: Machine Learning (cs.LG)

We investigate contrastive learning in the federated setting through the lens of SimCLR and multi-view mutual information maximization. In doing so, we uncover a connection between contrastive representation learning and user verification; by adding a user verification loss to each client's local SimCLR loss we recover a lower bound to the global multi-view mutual information. To accommodate for the case of when some labelled data are available at the clients, we extend our SimCLR variant to the federated semi-supervised setting. We see that a supervised SimCLR objective can be obtained with two changes: a) the contrastive loss is computed between datapoints that share the same label and b) we require an additional auxiliary head that predicts the correct labels from either of the two views. Along with the proposed SimCLR extensions, we also study how different sources of non-i.i.d.-ness can impact the performance of federated unsupervised learning through global mutual information maximization; we find that a global objective is beneficial for some sources of non-i.i.d.-ness but can be detrimental for others. We empirically evaluate our proposed extensions in various tasks to validate our claims and furthermore demonstrate that our proposed modifications generalize to other pretraining methods.

[268]  arXiv:2405.02083 [pdf, other]
Title: A semantic loss for ontology classification
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

Deep learning models are often unaware of the inherent constraints of the task they are applied to. However, many downstream tasks require logical consistency. For ontology classification tasks, such constraints include subsumption and disjointness relations between classes.
In order to increase the consistency of deep learning models, we propose a semantic loss that combines label-based loss with terms penalising subsumption- or disjointness-violations. Our evaluation on the ChEBI ontology shows that the semantic loss is able to decrease the number of consistency violations by several orders of magnitude without decreasing the classification performance. In addition, we use the semantic loss for unsupervised learning. We show that this can further improve consistency on data from a distribution outside the scope of the supervised training.

[269]  arXiv:2405.02086 [pdf, other]
Title: Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks
Subjects: Machine Learning (cs.LG)

The $\ell_{1,\infty}$ norm is an efficient structured projection but the complexity of the best algorithm is unfortunately $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}^{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power. We generalize our method to tensors and we propose a new multi-level projection, having an induced decomposition that yields a linear parallel speedup up to an exponential speedup factor, resulting in a time complexity lower-bounded by the sum of the dimensions. Experiments show that our bi-level $\ell_{1,\infty}$ projection is $2.5$ times faster than the actual fastest algorithm provided by \textit{Chu et. al.} while providing same accuracy and better sparsity in neural networks applications.

[270]  arXiv:2405.02094 [pdf, other]
Title: Numerical validation of an adaptive model for the determination of nonlinear-flow regions in highly heterogeneous porous media
Subjects: Numerical Analysis (math.NA)

An adaptive model for the description of flows in highly heterogeneous porous media is developed in~\cite{FP21,FP23}. There, depending on the magnitude of the fluid's velocity, the constitutive law linking velocity and pressure gradient is selected between two possible options, one better adapted to slow motion and the other to fast motion. We propose here to validate further this adaptive approach by means of more extensive numerical experiments, including a three-dimensional case, as well as to use such approach to determine a partition of the domain into slow- and fast-flow regions.

[271]  arXiv:2405.02095 [pdf, ps, other]
Title: Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

The capability of accurately determining code similarity is crucial in many tasks related to software development. For example, it might be essential to identify code duplicates for performing software maintenance. This research introduces a novel ensemble learning approach for code similarity assessment, combining the strengths of multiple unsupervised similarity measures. The key idea is that the strengths of a diverse set of similarity measures can complement each other and mitigate individual weaknesses, leading to improved performance. Preliminary results show that while Transformers-based CodeBERT and its variant GraphCodeBERT are undoubtedly the best option in the presence of abundant training data, in the case of specific small datasets (up to 500 samples), our ensemble achieves similar results, without prejudice to the interpretability of the resulting solution, and with a much lower associated carbon footprint due to training. The source code of this novel approach can be downloaded from https://github.com/jorge-martinez-gil/ensemble-codesim.

[272]  arXiv:2405.02098 [pdf, other]
Title: Forecasting Ferry Passenger Flow Using Long-Short Term Memory Neural Networks
Authors: Daniel Fesalbon
Subjects: Machine Learning (cs.LG)

With recent studies related to Neural Networks being used on different forecasting and time series investigations, this study aims to expand these contexts to ferry passenger traffic. The primary objective of the study is to investigate and evaluate an LSTM-based Neural Networks' capability to forecast ferry passengers of two ports in the Philippines. The proposed model's fitting and evaluation of the passenger flow forecasting of the two ports is based on monthly passenger traffic from 2016 to 2022 data that was acquired from the Philippine Ports Authority (PPA). This work uses Mean Absolute Percentage Error (MAPE) as its primary metric to evaluate the model's forecasting capability. The proposed LSTM-based Neural Networks model achieved 72% forecasting accuracy to the Batangas port ferry passenger data and 74% forecasting accuracy to the Mindoro port ferry passenger data. Using Keras and Scikit-learn Python libraries, this work concludes a reasonable forecasting performance of the presented LSTM model. Aside from these notable findings, this study also recommends further investigation and studies on employing other statistical, machine learning, and deep learning methods on forecasting ferry passenger flows.

[273]  arXiv:2405.02105 [pdf, other]
Title: Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph
Comments: 22 pages, 11 figures. In review at this https URL
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Theory (cs.IT)

Structured science summaries or research contributions using properties or dimensions beyond traditional keywords enhances science findability. Current methods, such as those used by the Open Research Knowledge Graph (ORKG), involve manually curating properties to describe research papers' contributions in a structured manner, but this is labor-intensive and inconsistent between the domain expert human curators. We propose using Large Language Models (LLMs) to automatically suggest these properties. However, it's essential to assess the readiness of LLMs like GPT-3.5, Llama 2, and Mistral for this task before application. Our study performs a comprehensive comparative analysis between ORKG's manually curated properties and those generated by the aforementioned state-of-the-art LLMs. We evaluate LLM performance through four unique perspectives: semantic alignment and deviation with ORKG properties, fine-grained properties mapping accuracy, SciNCL embeddings-based cosine similarity, and expert surveys comparing manual annotations with LLM outputs. These evaluations occur within a multidisciplinary science setting. Overall, LLMs show potential as recommendation systems for structuring science, but further finetuning is recommended to improve their alignment with scientific tasks and mimicry of human expertise.

[274]  arXiv:2405.02106 [pdf, ps, other]
Title: Got Root? A Linux Priv-Esc Benchmark
Comments: arXiv admin note: text overlap with arXiv:2310.11409
Subjects: Cryptography and Security (cs.CR)

Linux systems are integral to the infrastructure of modern computing environments, necessitating robust security measures to prevent unauthorized access. Privilege escalation attacks represent a significant threat, typically allowing attackers to elevate their privileges from an initial low-privilege account to the all-powerful root account.
A benchmark set of vulnerable systems is of high importance to evaluate the effectiveness of privilege-escalation techniques performed by both humans and automated tooling. Analyzing their behavior allows defenders to better fortify their entrusted Linux systems and thus protect their infrastructure from potentially devastating attacks.
To address this gap, we developed a comprehensive benchmark for Linux privilege escalation. It provides a standardized platform to evaluate and compare the performance of human and synthetic actors, e.g., hacking scripts or automated tooling.

[275]  arXiv:2405.02107 [pdf, ps, other]
Title: Equal Requests are Asymptotically Hardest for Data Recovery
Comments: 13 pages
Subjects: Information Theory (cs.IT); Combinatorics (math.CO); Probability (math.PR)

In a distributed storage system serving hot data, the data recovery performance becomes important, captured e.g. by the service rate. We give partial evidence for it being hardest to serve a sequence of equal user requests (as in PIR coding regime) both for concrete and random user requests and server contents.
We prove that a constant request sequence is locally hardest to serve: If enough copies of each vector are stored in servers, then if a request sequence with all requests equal can be served then we can still serve it if a few requests are changed.
For random iid server contents, with number of data symbols constant (for simplicity) and the number of servers growing, we show that the maximum number of user requests we can serve divided by the number of servers we need approaches a limit almost surely. For uniform server contents, we show this limit is 1/2, both for sequences of copies of a fixed request and of any requests, so it is at least as hard to serve equal requests as any requests. For iid requests independent from the uniform server contents the limit is at least 1/2 and equal to 1/2 if requests are all equal to a fixed request almost surely, confirming the same.
As a building block, we deduce from a 1952 result of Marshall Hall, Jr. on abelian groups, that any collection of half as many requests as coded symbols in the doubled binary simplex code can be served by this code. This implies the fractional version of the Functional Batch Code Conjecture that allows half-servers.

[276]  arXiv:2405.02113 [pdf, ps, other]
Title: A Workflow for GLAM Metadata Crosswalk
Comments: Submitted to AIUCD conference 2024 1 figure 8 pages
Subjects: Digital Libraries (cs.DL)

The acquisition of physical artifacts not only involves transferring existing information into the digital ecosystem but also generates information as a process itself, underscoring the importance of meticulous management of FAIR data and metadata. In addition, the diversity of objects within the cultural heritage domain is reflected in a multitude of descriptive models. The digitization process expands the opportunities for exchange and joint utilization, granted that the descriptive schemas are made interoperable in advance. To achieve this goal, we propose a replicable workflow for metadata schema crosswalks that facilitates the preservation and accessibility of cultural heritage in the digital ecosystem. This work presents a methodology for metadata generation and management in the case study of the digital twin of the temporary exhibition "The Other Renaissance - Ulisse Aldrovandi and the Wonders of the World". The workflow delineates a systematic, step-by-step transformation of tabular data into RDF format, to enhance Linked Open Data. The methodology adopts the RDF Mapping Language (RML) technology for converting data to RDF with a human contribution involvement. This last aspect entails an interaction between digital humanists and domain experts through surveys leading to the abstraction and reformulation of domain-specific knowledge, to be exploited in the process of formalizing and converting information.

[277]  arXiv:2405.02114 [pdf, other]
Title: Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation
Comments: ICME 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com/xzhouzeng/PRPose.

[278]  arXiv:2405.02119 [pdf, other]
Title: Can We Identify Unknown Audio Recording Environments in Forensic Scenarios?
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of the recorded audio to the recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide tools for closed-set recording environment classification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, closed-set tools are not applicable without retraining on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality.
In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining. Instead, it is the first tool for robust few-shot classification of unseen environment locations. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, environment characteristics or recording position mismatches.
Our code and datasets will be made publicly available upon acceptance.

[279]  arXiv:2405.02121 [pdf, other]
Title: Accurate Pose Prediction on Signed Distance Fields for Mobile Ground Robots in Rough Terrain
Comments: Published in: 2023 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). Video: this https URL
Journal-ref: 2023 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Naraha, Fukushima, Japan, 2023, pp. 47-52
Subjects: Robotics (cs.RO)

Autonomous locomotion for mobile ground robots in unstructured environments such as waypoint navigation or flipper control requires a sufficiently accurate prediction of the robot-terrain interaction. Heuristics like occupancy grids or traversability maps are widely used but limit actions available to robots with active flippers as joint positions are not taken into account. We present a novel iterative geometric method to predict the 3D pose of mobile ground robots with active flippers on uneven ground with high accuracy and online planning capabilities. This is achieved by utilizing the ability of signed distance fields to represent surfaces with sub-voxel accuracy. The effectiveness of the presented approach is demonstrated on two different tracked robots in simulation and on a real platform. Compared to a tracking system as ground truth, our method predicts the robot position and orientation with an average accuracy of 3.11 cm and 3.91{\deg}, outperforming a recent heightmap-based approach. The implementation is made available as an open-source ROS package.

[280]  arXiv:2405.02128 [pdf, ps, other]
Title: Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo
Subjects: Computation and Language (cs.CL); Materials Science (cond-mat.mtrl-sci)

The rapid advancement in artificial intelligence and natural language processing has led to the development of large-scale datasets aimed at benchmarking the performance of machine learning models. Herein, we introduce 'RetChemQA,' a comprehensive benchmark dataset designed to evaluate the capabilities of such models in the domain of reticular chemistry. This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 Q&As for each type. The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group, among others. The dataset has been generated using OpenAI's GPT-4 Turbo, a cutting-edge model known for its exceptional language understanding and generation capabilities. In addition to the Q&A dataset, we also release a dataset of synthesis conditions extracted from the corpus of literature used in this study. The aim of RetChemQA is to provide a robust platform for the development and evaluation of advanced machine learning algorithms, particularly for the reticular chemistry community. The dataset is structured to reflect the complexities and nuances of real-world scientific discourse, thereby enabling nuanced performance assessments across a variety of tasks. The dataset is available at the following link: https://github.com/nakulrampal/RetChemQA

[281]  arXiv:2405.02132 [pdf, other]
Title: Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Large Language Models have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition is becoming a mainstream paradigm. Building upon this momentum, our research delves into an indepth examination of this paradigm on a large opensource Chinese dataset. Specifically, our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules in the context of the speech foundation encoderLLM ASR paradigm. Furthermore, we introduce a threestage training approach, expressly developed to enhance the model's ability to align auditory and textual information. The implementation of this approach, alongside the strategic integration of ASR components, enabled us to achieve the SOTA performance on the AISHELL1, TestNet, and TestMeeting test sets. Our analysis presents an empirical foundation for future research in LLMbased ASR systems and offers insights into optimizing performance using Chinese datasets. We will publicly release all scripts used for data preparation, training, inference, and scoring, as well as pretrained models and training logs to promote reproducible research.

[282]  arXiv:2405.02133 [pdf, other]
Title: Learning from Evolution: Improving Collective Decision-Making Mechanisms using Insights from Evolutionary Robotics
Subjects: Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

Collective decision-making enables multi-robot systems to act autonomously in real-world environments. Existing collective decision-making mechanisms suffer from the so-called speed versus accuracy trade-off or rely on high complexity, e.g., by including global communication. Recent work has shown that more efficient collective decision-making mechanisms based on artificial neural networks can be generated using methods from evolutionary computation. A major drawback of these decision-making neural networks is their limited interpretability. Analyzing evolved decision-making mechanisms can help us improve the efficiency of hand-coded decision-making mechanisms while maintaining a higher interpretability. In this paper, we analyze evolved collective decision-making mechanisms in detail and hand-code two new decision-making mechanisms based on the insights gained. In benchmark experiments, we show that the newly implemented collective decision-making mechanisms are more efficient than the state-of-the-art collective decision-making mechanisms voter model and majority rule.

[283]  arXiv:2405.02134 [pdf, other]
Title: Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
Subjects: Computation and Language (cs.CL)

Researchers and practitioners operating on a limited budget face the cost-performance trade-off dilemma. The challenging decision often centers on whether to use a large LLM with better performance or a smaller one with reduced costs. This has motivated recent research in the optimisation of LLM calls. Either a cascading strategy is used, where a smaller LLM or both are called sequentially, or a routing strategy is used, where only one model is ever called. Both scenarios are dependent on a decision criterion which is typically implemented by an extra neural model. In this work, we propose a simpler solution; we use only the uncertainty of the generations of the small LLM as the decision criterion. We compare our approach with both cascading and routing strategies using three different pairs of pre-trained small and large LLMs, on nine different tasks and against approaches that require an additional neural model. Our experiments reveal this simple solution optimally balances cost and performance, outperforming existing methods on 25 out of 27 experimental setups.

[284]  arXiv:2405.02139 [pdf, other]
Title: Multi-rate Runge-Kutta methods: stability analysis and applications
Subjects: Numerical Analysis (math.NA)

We present an approach for the efficient implementation of self-adjusting multi-rate Runge-Kutta methods and we extend the previously available stability analyses of these methods to the case of an arbitrary number of sub-steps for the active components. We propose a physically motivated model problem that can be used to assess the stability of different multi-rate versions of standard Runge-Kutta methods and the impact of different interpolation methods for the latent variables. Finally, we present the results of several numerical experiments, performed with implementations of the proposed methods in the framework of the \textit{OpenModelica} open-source modelling and simulation software, which demonstrate the efficiency gains deriving from the use of the proposed multi-rate approach for physical modelling problems with multiple time scales.

[285]  arXiv:2405.02140 [pdf, other]
Title: An Information Theoretic Perspective on Conformal Prediction
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.

[286]  arXiv:2405.02141 [pdf, other]
Title: Multi-Objective Recommendation via Multivariate Policy Learning
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users. These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness). Scalarisation methods are commonly used to handle this balancing task, where a weighted average of per-objective reward signals determines the final score used for ranking. Naturally, how these weights are computed exactly, is key to success for any online platform. We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e.g. long-term user retention or growth). We extend existing policy learning methods to the continuous multivariate action domain, proposing to maximise a pessimistic lower bound on the North Star reward that the learnt policy will yield. Typical lower bounds based on normal approximations suffer from insufficient coverage, and we propose an efficient and effective policy-dependent correction for this. We provide guidance to design stochastic data collection policies, as well as highly sensitive reward signals. Empirical observations from simulations, offline and online experiments highlight the efficacy of our deployed approach.

[287]  arXiv:2405.02144 [pdf, other]
Title: MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain
Authors: Chao Jiang, Wei Xu
Subjects: Computation and Language (cs.CL)

Medical texts are notoriously challenging to read. Properly measuring their readability is the first step towards making them more accessible. In this paper, we present a systematic study on fine-grained readability measurements in the medical domain at both sentence-level and span-level. We introduce a new dataset MedReadMe, which consists of manually annotated readability ratings and fine-grained complex span annotation for 4,520 sentences, featuring two novel "Google-Easy" and "Google-Hard" categories. It supports our quantitative analysis, which covers 650 linguistic features and automatic complex word and jargon identification. Enabled by our high-quality annotation, we benchmark and improve several state-of-the-art sentence-level readability metrics for the medical domain specifically, which include unsupervised, supervised, and prompting-based methods using recently developed large language models (LLMs). Informed by our fine-grained complex span annotation, we find that adding a single feature, capturing the number of jargon spans, into existing readability formulas can significantly improve their correlation with human judgments. We will publicly release the dataset and code.

[288]  arXiv:2405.02145 [pdf, other]
Title: Characterized Diffusion and Spatial-Temporal Interaction Network for Trajectory Prediction in Autonomous Driving
Comments: Accepted by IJCAI 2024
Subjects: Robotics (cs.RO)

Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module, an innovative module designed to simulate traffic scenarios with inherent uncertainty. This module enriches the predictive process by infusing it with detailed semantic information, thereby enhancing trajectory prediction accuracy. Complementing this, our Spatio-Temporal (ST) Interaction Module captures the nuanced effects of traffic scenarios on vehicle dynamics across both spatial and temporal dimensions with remarkable effectiveness. Demonstrated through exhaustive evaluations, our model sets a new standard in trajectory prediction, achieving state-of-the-art (SOTA) results on the Next Generation Simulation (NGSIM), Highway Drone (HighD), and Macao Connected Autonomous Driving (MoCAD) datasets across both short and extended temporal spans. This performance underscores the model's unparalleled adaptability and efficacy in navigating complex traffic scenarios, including highways, urban streets, and intersections.

[289]  arXiv:2405.02147 [pdf, other]
Title: Payout Races and Congested Channels: A Formal Analysis of Security in the Lightning Network
Comments: 16 pages, 7 figures, to appear at ACM CCS 2024
Subjects: Cryptography and Security (cs.CR)

The Lightning Network, a payment channel network with a market cap of over 192M USD, is designed to resolve Bitcoin's scalability issues through fast off-chain transactions. There are multiple Lightning Network client implementations, all of which conform to the same textual specifications known as BOLTs. Several vulnerabilities have been manually discovered, but to-date there have been few works systematically analyzing the security of the Lightning Network.
In this work, we take a foundational approach to analyzing the security of the Lightning Network with the help of formal methods. Based on the BOLTs' specifications, we build a detailed formal model of the Lightning Network's single-hop payment protocol and verify it using the Spin model checker. Our model captures both concurrency and error semantics of the payment protocol. We then define several security properties which capture the correct intermediate operation of the protocol, ensuring that the outcome is always certain to both channel peers, and using them we re-discover a known attack previously reported in the literature along with a novel attack, referred to as a Payout Race. A Payout Race consists of a particular sequence of events that can lead to an ambiguity in the protocol in which innocent users can unwittingly lose funds. We confirm the practicality of this attack by reproducing it in a local testbed environment.

[290]  arXiv:2405.02148 [pdf, ps, other]
Title: Towards a Formal Creativity Theory: Preliminary results in Novelty and Transformativeness
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Formalizing creativity-related concepts has been a long-term goal of Computational Creativity. To the same end, we explore Formal Learning Theory in the context of creativity. We provide an introduction to the main concepts of this framework and a re-interpretation of terms commonly found in creativity discussions, proposing formal definitions for novelty and transformational creativity. This formalisation marks the beginning of a research branch we call Formal Creativity Theory, exploring how learning can be included as preparation for exploratory behaviour and how learning is a key part of transformational creative behaviour. By employing these definitions, we argue that, while novelty is neither necessary nor sufficient for transformational creativity in general, when using an inspiring set, rather than a sequence of experiences, an agent actually requires novelty for transformational creativity to occur.

[291]  arXiv:2405.02150 [pdf, other]
Title: The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates
Comments: Manoel Horta Ribeiro, Tim R. Davidson, and Veniamin Veselovsky contributed equally to this work
Subjects: Computers and Society (cs.CY)

Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8\%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4\%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4\%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends

[292]  arXiv:2405.02151 [pdf, other]
Title: GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

The continuous evolution of pre-trained speech models has greatly advanced Speech Emotion Recognition (SER). However, there is still potential for enhancement in the performance of these methods. In this paper, we present GMP-ATL (Gender-augmented Multi-scale Pseudo-label Adaptive Transfer Learning), a novel HuBERT-based adaptive transfer learning framework for SER. Specifically, GMP-ATL initially employs the pre-trained HuBERT, implementing multi-task learning and multi-scale k-means clustering to acquire frame-level gender-augmented multi-scale pseudo-labels. Then, to fully leverage both obtained frame-level and utterance-level emotion labels, we incorporate model retraining and fine-tuning methods to further optimize GMP-ATL. Experiments on IEMOCAP show that our GMP-ATL achieves superior recognition performance, with a WAR of 80.0\% and a UAR of 82.0\%, surpassing state-of-the-art unimodal SER methods, while also yielding comparable results with multimodal SER approaches.

[293]  arXiv:2405.02154 [pdf, other]
Title: Neural Context Flows for Learning Generalizable Dynamical Systems
Comments: 14 pages, 5 figures
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)

Neural Ordinary Differential Equations typically struggle to generalize to new dynamical behaviors created by parameter changes in the underlying system, even when the dynamics are close to previously seen behaviors. The issue gets worse when the changing parameters are unobserved, i.e., their value or influence is not directly measurable when collecting data. We introduce Neural Context Flow (NCF), a framework that encodes said unobserved parameters in a latent context vector as input to a vector field. NCFs leverage differentiability of the vector field with respect to the parameters, along with first-order Taylor expansion to allow any context vector to influence trajectories from other parameters. We validate our method and compare it to established Multi-Task and Meta-Learning alternatives, showing competitive performance in mean squared error for in-domain and out-of-distribution evaluation on the Lotka-Volterra, Glycolytic Oscillator, and Gray-Scott problems. This study holds practical implications for foundational models in science and related areas that benefit from conditional neural ODEs. Our code is openly available at https://github.com/ddrous/ncflow.

[294]  arXiv:2405.02155 [pdf, other]
Title: Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification
Authors: Siqi Yin, Lifan Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to create reference images that can precisely describe unseen categories and classification boundaries, thereby alleviating the information bottleneck issue; 2) Integrating the results of text-image alignment and image-image alignment from CLIP, along with the image-image alignment results from DINO, to achieve more accurate predictions; 3) Introducing an adaptive weighting mechanism based on confidence levels to aggregate the outcomes from different prediction methods. Experimental results on multiple datasets, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our model can significantly improve classification accuracy compared to single-model approaches, achieving AUROC scores above 96% across all test datasets, and notably surpassing 99% on the CIFAR-10 dataset.

[295]  arXiv:2405.02156 [pdf, other]
Title: How to Diversify any Personalized Recommender? A User-centric Pre-processing approach
Subjects: Information Retrieval (cs.IR)

In this paper, we introduce a novel approach to improve the diversity of Top-N recommendations while maintaining recommendation performance. Our approach employs a user-centric pre-processing strategy aimed at exposing users to a wide array of content categories and topics. We personalize this strategy by selectively adding and removing a percentage of interactions from user profiles. This personalization ensures we remain closely aligned with user preferences while gradually introducing distribution shifts. Our pre-processing technique offers flexibility and can seamlessly integrate into any recommender architecture. To evaluate our approach, we run extensive experiments on two publicly available data sets for news and book recommendations. We test various standard and neural network-based recommender system algorithms. Our results show that our approach generates diverse recommendations, ensuring users are exposed to a wider range of items. Furthermore, leveraging pre-processed data for training leads to recommender systems achieving performance levels comparable to, and in some cases, better than those trained on original, unmodified data. Additionally, our approach promotes provider fairness by facilitating exposure to minority or niche categories.

[296]  arXiv:2405.02161 [pdf, other]
Title: Simulating the economic impact of rationality through reinforcement learning and agent-based modelling
Comments: 8 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Multiagent Systems (cs.MA); General Economics (econ.GN)

Agent-based models (ABMs) are simulation models used in economics to overcome some of the limitations of traditional frameworks based on general equilibrium assumptions. However, agents within an ABM follow predetermined, not fully rational, behavioural rules which can be cumbersome to design and difficult to justify. Here we leverage multi-agent reinforcement learning (RL) to expand the capabilities of ABMs with the introduction of fully rational agents that learn their policy by interacting with the environment and maximising a reward function. Specifically, we propose a 'Rational macro ABM' (R-MABM) framework by extending a paradigmatic macro ABM from the economic literature. We show that gradually substituting ABM firms in the model with RL agents, trained to maximise profits, allows for a thorough study of the impact of rationality on the economy. We find that RL agents spontaneously learn three distinct strategies for maximising profits, with the optimal strategy depending on the level of market competition and rationality. We also find that RL agents with independent policies, and without the ability to communicate with each other, spontaneously learn to segregate into different strategic groups, thus increasing market power and overall profits. Finally, we find that a higher degree of rationality in the economy always improves the macroeconomic environment as measured by total output, depending on the specific rational policy, this can come at the cost of higher instability. Our R-MABM framework is general, it allows for stable multi-agent learning, and represents a principled and robust direction to extend existing economic simulators.

[297]  arXiv:2405.02162 [pdf, other]
Title: Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

In the field of robotics and computer vision, efficient and accurate semantic mapping remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. Conventional panoptic mapping methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects. In response to this limitation, we introduce the Unified Promptable Panoptic Mapping (UPPM) method. UPPM utilizes recent advances in foundation models to enable real-time, on-demand label generation using natural language prompts. By incorporating a dynamic labeling strategy into traditional panoptic mapping techniques, UPPM provides significant improvements in adaptability and versatility while maintaining high performance levels in map reconstruction. We demonstrate our approach on real-world and simulated datasets. Results show that UPPM can accurately reconstruct scenes and segment objects while generating rich semantic labels through natural language interactions. A series of ablation experiments validated the advantages of foundation model-based labeling over fixed label sets.

[298]  arXiv:2405.02165 [pdf, other]
Title: EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Deciphering the intricacies of the human brain has captivated curiosity for centuries. Recent strides in Brain-Computer Interface (BCI) technology, particularly using motor imagery, have restored motor functions such as reaching, grasping, and walking in paralyzed individuals. However, unraveling natural language from brain signals remains a formidable challenge. Electroencephalography (EEG) is a non-invasive technique used to record electrical activity in the brain by placing electrodes on the scalp. Previous studies of EEG-to-text decoding have achieved high accuracy on small closed vocabularies, but still fall short of high accuracy when dealing with large open vocabularies. We propose a novel method, EEG2TEXT, to improve the accuracy of open vocabulary EEG-to-text decoding. Specifically, EEG2TEXT leverages EEG pre-training to enhance the learning of semantics from EEG signals and proposes a multi-view transformer to model the EEG signal processing by different spatial regions of the brain. Experiments show that EEG2TEXT has superior performance, outperforming the state-of-the-art baseline methods by a large margin of up to 5% in absolute BLEU and ROUGE scores. EEG2TEXT shows great potential for a high-performance open-vocabulary brain-to-text system to facilitate communication.

[299]  arXiv:2405.02169 [pdf, ps, other]
Title: Transimpedance Amplifier with Automatic Gain Control Based on Memristors for Optical Signal Acquisition
Subjects: Hardware Architecture (cs.AR)

Transimpedance amplifiers (TIA) play a crucial role in various electronic systems, especially in optical signal acquisition. However, their performance is often hampered by saturation issues due to high input currents, leading to prolonged recovery times. This paper addresses this challenge by introducing a novel approach utilizing a memristive automatic gain control (AGC) to adjust the TIA's gain and enhance its dynamic range. We replace the typical feedback resistor of a TIA with a valence-change mechanism (VCM) memristor. This substitution enables the TIA to adapt to a broader range of input signals, leveraging the substantial OFF/ON resistance ratio of the memristor. This paper also presents the reading and resetting sub-circuits essential for monitoring and controling the memristor's state. The proposed circuit is evaluated through SPICE simulations. Furthermore, we extend our evaluation to practical testing using a printed circuit board (PCB) integrating the TIA and memristor. We show a remarkable 40 dB increase in the dynamic range of our TIA memristor circuit compared to traditional resistor-based TIAs.

[300]  arXiv:2405.02171 [pdf, other]
Title: Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations
Comments: Accpted by IEEE TPAMI in 2024. Extended version of ECCV 2022 paper "Self-Supervised Learning for Real-World Super-Resolution from Dual Zoomed Observations" (arXiv:2203.01325)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms. Firstly, considering the popularity of multiple cameras in modern smartphones, the more zoomed (telephoto) image can be naturally leveraged as the reference to guide the super-resolution (SR) of the lesser zoomed (ultra-wide) image, which gives us a chance to learn a deep network that performs SR from the dual zoomed observations (DZSR). Secondly, for self-supervised learning of DZSR, we take the telephoto image instead of an additional high-resolution image as the supervision information, and select a center patch from it as the reference to super-resolve the corresponding ultra-wide image patch. To mitigate the effect of the misalignment between ultra-wide low-resolution (LR) patch and telephoto ground-truth (GT) image during training, we first adopt patch-based optical flow alignment and then design an auxiliary-LR to guide the deforming of the warped LR features. To generate visually pleasing results, we present local overlapped sliced Wasserstein loss to better represent the perceptual difference between GT and output in the feature space. During testing, DZSR can be directly deployed to super-solve the whole ultra-wide image with the reference of the telephoto image. In addition, we further take multiple zoomed observations to explore self-supervised RefSR, and present a progressive fusion scheme for the effective utilization of reference images. Experiments show that our methods achieve better quantitative and qualitative performance against state-of-the-arts. Codes are available at https://github.com/cszhilu1998/SelfDZSR_PlusPlus.

[301]  arXiv:2405.02173 [pdf, other]
Title: Task Synthesis for Elementary Visual Programming in XLogoOnline Environment
Comments: Accepted as a paper at the AIED'24 conference in the late-breaking results track
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)

In recent years, the XLogoOnline programming platform has gained popularity among novice learners. It integrates the Logo programming language with visual programming, providing a visual interface for learning computing concepts. However, XLogoOnline offers only a limited set of tasks, which are inadequate for learners to master the computing concepts that require sufficient practice. To address this, we introduce XLogoSyn, a novel technique for synthesizing high-quality tasks for varying difficulty levels. Given a reference task, XLogoSyn can generate practice tasks at varying difficulty levels that cater to the varied needs and abilities of different learners. XLogoSyn achieves this by combining symbolic execution and constraint satisfaction techniques. Our expert study demonstrates the effectiveness of XLogoSyn. We have also deployed synthesized practice tasks into XLogoOnline, highlighting the educational benefits of these synthesized practice tasks.

[302]  arXiv:2405.02175 [pdf, other]
Title: Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset
Comments: Short paper
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Hoaxes are a recognised form of disinformation created deliberately, with potential serious implications in the credibility of reference knowledge resources such as Wikipedia. What makes detecting Wikipedia hoaxes hard is that they often are written according to the official style guidelines. In this work, we first provide a systematic analysis of the similarities and discrepancies between legitimate and hoax Wikipedia articles, and introduce Hoaxpedia, a collection of 311 Hoax articles (from existing literature as well as official Wikipedia lists) alongside semantically similar real articles. We report results of binary classification experiments in the task of predicting whether a Wikipedia article is real or hoax, and analyze several settings as well as a range of language models. Our results suggest that detecting deceitful content in Wikipedia based on content alone, despite not having been explored much in the past, is a promising direction.

[303]  arXiv:2405.02177 [pdf, other]
Title: Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation
Subjects: Robotics (cs.RO)

The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unknown objects. It uses panoptic segmentation to filter dynamic objects from the scene during the state estimation process. Panoptic-SLAM is based on ORB-SLAM3, a state-of-the-art SLAM system for static environments. The implementation was tested using real-world datasets and compared with several state-of-the-art systems from the literature, including DynaSLAM, DS-SLAM, SaD-SLAM, PVO and FusingPanoptic. For example, Panoptic-SLAM is on average four times more accurate than PVO, the most recent panoptic-based approach for visual SLAM. Also, experiments were performed using a quadruped robot with an RGB-D camera to test the applicability of our method in real-world scenarios. The tests were validated by a ground-truth created with a motion capture system.

[304]  arXiv:2405.02178 [pdf, other]
Title: Assessing and Verifying Task Utility in LLM-Powered Applications
Comments: arXiv admin note: text overlap with arXiv:2402.09015
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The rapid development of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents, assisting humans in their daily tasks. However, a significant gap remains in assessing to what extent LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the need to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the effectiveness and robustness of AgentEval for two open source datasets including Math Problem solving and ALFWorld House-hold related tasks. For reproducibility purposes, we make the data, code and all the logs publicly available at https://bit.ly/3w3yKcS .

[305]  arXiv:2405.02179 [pdf, other]
Title: Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)

Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for.In this paper we study the potential of large-scale pre-trained models for audio deepfake detection, with special focus on generalization ability. To this end, the detection problem is reformulated in a speaker verification framework and fake audios are exposed by the mismatch between the voice sample under test and the voice of the claimed identity. With this paradigm, no fake speech sample is necessary in training, cutting off any link with the generation method at the root, and ensuring full generalization ability. Features are extracted by general-purpose large pre-trained models, with no need for training or fine-tuning on specific fake detection or speaker verification datasets. At detection time only a limited set of voice fragments of the identity under test is required. Experiments on several datasets widespread in the community show that detectors based on pre-trained models achieve excellent performance and show strong generalization ability, rivaling supervised methods on in-distribution data and largely overcoming them on out-of-distribution data.

[306]  arXiv:2405.02180 [pdf, other]
Title: A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, particularly as diverse low-carbon technologies are increasingly integrated. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it shows superior scalability in different datasets compared to traditional statistical, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models.

[307]  arXiv:2405.02181 [pdf, other]
Title: Imitation Learning in Discounted Linear MDPs without exploration assumptions
Comments: Accepted at ICML 2024
Subjects: Machine Learning (cs.LG)

We present a new algorithm for imitation learning in infinite horizon linear MDPs dubbed ILARL which greatly improves the bound on the number of trajectories that the learner needs to sample from the environment. In particular, we remove exploration assumptions required in previous works and we improve the dependence on the desired accuracy $\epsilon$ from $\mathcal{O}\br{\epsilon^{-5}}$ to $\mathcal{O}\br{\epsilon^{-4}}$. Our result relies on a connection between imitation learning and online learning in MDPs with adversarial losses. For the latter setting, we present the first result for infinite horizon linear MDP which may be of independent interest. Moreover, we are able to provide a strengthen result for the finite horizon case where we achieve $\mathcal{O}\br{\epsilon^{-2}}$. Numerical experiments with linear function approximation shows that ILARL outperforms other commonly used algorithms.

[308]  arXiv:2405.02182 [pdf, other]
Title: Hybridizable discontinuous Galerkin methods for solving the two-fluid plasma model
Subjects: Numerical Analysis (math.NA); Plasma Physics (physics.plasm-ph)

The two-fluid plasma model has a wide range of timescales which must all be numerically resolved regardless of the timescale on which plasma dynamics occurs. The answer to solving numerically stiff systems is generally to utilize unconditionally stable implicit time advance methods. Hybridizable discontinuous Galerkin (HDG) methods have emerged as a powerful tool for solving stiff partial differential equations. The HDG framework combines the advantages of the discontinuous Galerkin (DG) method, such as high-order accuracy and flexibility in handling mixed hyperbolic/parabolic PDEs with the advantage of classical continuous finite element methods for constructing small numerically stable global systems which can be solved implicitly. In this research we quantify the numerical stability conditions for the two-fluid equations and demonstrate how HDG can be used to avoid the strict stability requirements while maintaining high order accurate results.

[309]  arXiv:2405.02183 [pdf, other]
Title: Metalearners for Ranking Treatment Effects
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Efficiently allocating treatments with a budget constraint constitutes an important challenge across various domains. In marketing, for example, the use of promotions to target potential customers and boost conversions is limited by the available budget. While much research focuses on estimating causal effects, there is relatively limited work on learning to allocate treatments while considering the operational context. Existing methods for uplift modeling or causal inference primarily estimate treatment effects, without considering how this relates to a profit maximizing allocation policy that respects budget constraints. The potential downside of using these methods is that the resulting predictive model is not aligned with the operational context. Therefore, prediction errors are propagated to the optimization of the budget allocation problem, subsequently leading to a suboptimal allocation policy. We propose an alternative approach based on learning to rank. Our proposed methodology directly learns an allocation policy by prioritizing instances in terms of their incremental profit. We propose an efficient sampling procedure for the optimization of the ranking model to scale our methodology to large-scale data sets. Theoretically, we show how learning to rank can maximize the area under a policy's incremental profit curve. Empirically, we validate our methodology and show its effectiveness in practice through a series of experiments on both synthetic and real-world data.

[310]  arXiv:2405.02184 [pdf, other]
Title: Hybrid Lyapunov-based feedback stabilization of bipedal locomotion based on reference spreading
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)

We propose a hybrid formulation of the linear inverted pendulum model for bipedal locomotion, where the foot switches are triggered based on the center of mass position, removing the need for pre-defined footstep timings. Using a concept similar to reference spreading, we define nontrivial tracking error coordinates induced by our hybrid model. These coordinates enjoy desirable linear flow dynamics and rather elegant jump dynamics perturbed by a suitable extended class ${\mathcal K}_\infty$ function of the position error. We stabilize this hybrid error dynamics using a saturated feedback controller, selecting its gains by solving a convex optimization problem. We prove local asymptotic stability of the tracking error and provide a certified estimate of the basin of attraction, comparing it with a numerical estimate obtained from the integration of the closed-loop dynamics. Simulations on a full-body model of a real robot show the practical applicability of the proposed framework and its advantages with respect to a standard model predictive control formulation.

[311]  arXiv:2405.02187 [pdf, other]
Title: X-SLAM: Scalable Dense SLAM for Task-aware Optimization using CSFD
Comments: To be published in ACM SIGGRAPH 2024
Subjects: Robotics (cs.RO)

We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters through Taylor series expansion within the complex domain. Our system allows for the real-time calculation of not just the gradient, but also higher-order differentiation. This facilitates the use of high-order optimizers to achieve better accuracy and faster convergence. Building on X-SLAM, we implemented end-to-end optimization frameworks for two important tasks: camera relocalization in wide outdoor scenes and active robotic scanning in complex indoor environments. Comprehensive evaluations on public benchmarks and intricate real scenes underscore the improvements in the accuracy of camera relocalization and the efficiency of robotic navigation achieved through our task-aware optimization. The code and data are available at https://gapszju.github.io/X-SLAM.

[312]  arXiv:2405.02191 [pdf, ps, other]
Title: Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning
Comments: 4 pages,4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Peat, a crucial component in whisky production, imparts distinctive and irreplaceable flavours to the final product. However, the extraction of peat disrupts ancient ecosystems and releases significant amounts of carbon, contributing to climate change. This paper aims to address this issue by conducting a feasibility study on enhancing peat use efficiency in whisky manufacturing through non-destructive analysis using hyperspectral imaging. Results show that shot-wave infrared (SWIR) data is more effective for analyzing peat samples and predicting total phenol levels, with accuracies up to 99.81%.

[313]  arXiv:2405.02195 [pdf, ps, other]
Title: Impact of emoji exclusion on the performance of Arabic sarcasm detection models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

The complex challenge of detecting sarcasm in Arabic speech on social media is increased by the language diversity and the nature of sarcastic expressions. There is a significant gap in the capability of existing models to effectively interpret sarcasm in Arabic, which mandates the necessity for more sophisticated and precise detection methods. In this paper, we investigate the impact of a fundamental preprocessing component on sarcasm speech detection. While emojis play a crucial role in mitigating the absence effect of body language and facial expressions in modern communication, their impact on automated text analysis, particularly in sarcasm detection, remains underexplored. We investigate the impact of emoji exclusion from datasets on the performance of sarcasm detection models in social media content for Arabic as a vocabulary-super rich language. This investigation includes the adaptation and enhancement of AraBERT pre-training models, specifically by excluding emojis, to improve sarcasm detection capabilities. We use AraBERT pre-training to refine the specified models, demonstrating that the removal of emojis can significantly boost the accuracy of sarcasm detection. This approach facilitates a more refined interpretation of language, eliminating the potential confusion introduced by non-textual elements. The evaluated AraBERT models, through the focused strategy of emoji removal, adeptly navigate the complexities of Arabic sarcasm. This study establishes new benchmarks in Arabic natural language processing and presents valuable insights for social media platforms.

[314]  arXiv:2405.02196 [pdf, other]
Title: GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse
Subjects: Hardware Architecture (cs.AR)

Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and CGRA, support tensor operators with low energy and area efficiency. This paper conducts an in-depth exploration of general accelerator for tensor processing.
First, we find the similarity between matrix multiplication and precision multiplication, and create a method classifying tensor operators. Then, we implement two discoveries and introduce the systolic architecture into general-purpose accelerator. Therefore, we propose a new General Tensor Accelerator (GTA), which has a better area efficiency and data reuse. Furthermore, we create a large hardware scheduling space consisting of dataflow, precision and array resize. Our evaluation results demonstrate that GTA is able to achieves 7.76X, 5.35X, 8.76X memory efficiency and 6.45X, 3.39X, 25.83X speedup over of VPU, GPGPU and CGRA.

[315]  arXiv:2405.02198 [pdf, other]
Title: The Cambridge RoboMaster: An Agile Multi-Robot Research Platform
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

Compact robotic platforms with powerful compute and actuation capabilities are key enablers for practical, real-world deployments of multi-agent research. This article introduces a tightly integrated hardware, control, and simulation software stack on a fleet of holonomic ground robot platforms designed with this motivation. Our robots, a fleet of customised DJI Robomaster S1 vehicles, offer a balance between small robots that do not possess sufficient compute or actuation capabilities and larger robots that are unsuitable for indoor multi-robot tests. They run a modular ROS2-based optimal estimation and control stack for full onboard autonomy, contain ad-hoc peer-to-peer communication infrastructure, and can zero-shot run multi-agent reinforcement learning (MARL) policies trained in our vectorized multi-agent simulation framework. We present an in-depth review of other platforms currently available, showcase new experimental validation of our system's capabilities, and introduce case studies that highlight the versatility and reliabilty of our system as a testbed for a wide range of research demonstrations. Our system as well as supplementary material is available online: https://proroklab.github.io/cambridge-robomaster

[316]  arXiv:2405.02200 [pdf, other]
Title: Position Paper: Rethinking Empirical Research in Machine Learning: Addressing Epistemic and Methodological Challenges of Experimentation
Comments: Accepted for publication at ICML 2024
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We warn against a common but incomplete understanding of empirical research in machine learning (ML) that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical ML research is fashioned as confirmatory research while it should rather be considered exploratory.

[317]  arXiv:2405.02203 [pdf, other]
Title: Convergence of a Finite Volume Scheme for Compactly Heterogeneous Scalar Conservation Laws
Authors: Abraham Sylla
Subjects: Numerical Analysis (math.NA)

We build a finite volume scheme for the scalar conservation law $\partial_t u + \partial_x (H(x, u)) = 0$ with bounded initial condition for a wide class of flux function $H$, convex with respect to the second variable. The main idea for the construction of the scheme is to use the theory of discontinuous flux. We prove that the resulting approximating sequence converges boundedly almost everywhere on $\mathopen]0, +\infty\mathclose[$ to the entropy solution.

[318]  arXiv:2405.02213 [pdf, other]
Title: Automatic Programming: Large Language Models and Beyond
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance

[319]  arXiv:2405.02218 [pdf, other]
Title: Multispectral Fine-Grained Classification of Blackgrass in Wheat and Barley Crops
Comments: 19 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As the burden of herbicide resistance grows and the environmental repercussions of excessive herbicide use become clear, new ways of managing weed populations are needed. This is particularly true for cereal crops, like wheat and barley, that are staple food crops and occupy a globally significant portion of agricultural land. Even small improvements in weed management practices across these major food crops worldwide would yield considerable benefits for both the environment and global food security. Blackgrass is a major grass weed which causes particular problems in cereal crops in north-west Europe, a major cereal production area, because it has high levels of of herbicide resistance and is well adapted to agronomic practice in this region. With the use of machine vision and multispectral imaging, we investigate the effectiveness of state-of-the-art methods to identify blackgrass in wheat and barley crops. As part of this work, we provide a large dataset with which we evaluate several key aspects of blackgrass weed recognition. Firstly, we determine the performance of different CNN and transformer-based architectures on images from unseen fields. Secondly, we demonstrate the role that different spectral bands have on the performance of weed classification. Lastly, we evaluate the role of dataset size in classification performance for each of the models trialled. We find that even with a fairly modest quantity of training data an accuracy of almost 90% can be achieved on images from unseen fields.

[320]  arXiv:2405.02219 [pdf, other]
Title: FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems
Authors: Yashar Deldjoo
Subjects: Information Retrieval (cs.IR)

This paper presents a framework for evaluating fairness in recommender systems powered by Large Language Models (RecLLMs), addressing the need for a unified approach that spans various fairness dimensions including sensitivity to user attributes, intrinsic fairness, and discussions of fairness based on underlying benefits. In addition, our framework introduces counterfactual evaluations and integrates diverse user group considerations to enhance the discourse on fairness evaluation for RecLLMs.
Our key contributions include the development of a robust framework for fairness evaluation in LLM-based recommendations and a structured method to create \textit{informative user profiles} from demographic data, historical user preferences, and recent interactions. We argue that the latter is essential for enhancing personalization in such systems, especially in temporal-driven scenarios. We demonstrate the utility of our framework through practical applications on two datasets, LastFM-1K and ML-1M. We conduct experiments on a subsample of 80 users from each dataset, testing and assessing the effectiveness of various prompt construction scenarios and in-context learning, comprising more than 50 scenarios. This results in more than 4000 recommendations (80 * 50 = 4000). Our study reveals that while there are no significant unfairness issues in scenarios involving sensitive attributes, some concerns remain. However, in terms of intrinsic fairness, which does not involve direct sensitivity, unfairness across demographic groups remains significant. The code and data used for this paper are available at: \url{https://shorturl.at/awBFM}.

[321]  arXiv:2405.02220 [pdf, other]
Title: Designed Dithering Sign Activation for Binary Neural Networks
Comments: 7 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Binary Neural Networks emerged as a cost-effective and energy-efficient solution for computer vision tasks by binarizing either network weights or activations. However, common binary activations, such as the Sign activation function, abruptly binarize the values with a single threshold, losing fine-grained details in the feature outputs. This work proposes an activation that applies multiple thresholds following dithering principles, shifting the Sign activation function for each pixel according to a spatially periodic threshold kernel. Unlike literature methods, the shifting is defined jointly for a set of adjacent pixels, taking advantage of spatial correlations. Experiments over the classification task demonstrate the effectiveness of the designed dithering Sign activation function as an alternative activation for binary neural networks, without increasing the computational cost. Further, DeSign balances the preservation of details with the efficiency of binary operations.

[322]  arXiv:2405.02221 [pdf, other]
Title: Discretization Error of Fourier Neural Operators
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

Operator learning is a variant of machine learning that is designed to approximate maps between function spaces from data. The Fourier Neural Operator (FNO) is a common model architecture used for operator learning. The FNO combines pointwise linear and nonlinear operations in physical space with pointwise linear operations in Fourier space, leading to a parameterized map acting between function spaces. Although FNOs formally involve convolutions of functions on a continuum, in practice the computations are performed on a discretized grid, allowing efficient implementation via the FFT. In this paper, the aliasing error that results from such a discretization is quantified and algebraic rates of convergence in terms of the grid resolution are obtained as a function of the regularity of the input. Numerical experiments that validate the theory and describe model stability are performed.

[323]  arXiv:2405.02228 [pdf, other]
Title: REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs
Comments: Submitted to ACL ARR April 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Automatic citation generation for sentences in a document or report is paramount for intelligence analysts, cybersecurity, news agencies, and education personnel. In this research, we investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries: (a) Direct Queries, LLMs are asked to provide author names of the given research article, and (b) Indirect Queries, LLMs are asked to provide the title of a mentioned article when given a sentence from a different article. To demonstrate where LLM stands in this task, we introduce a large dataset called REASONS comprising abstracts of the 12 most popular domains of scientific research on arXiv. From around 20K research articles, we make the following deductions on public and proprietary LLMs: (a) State-of-the-art, often called anthropomorphic GPT-4 and GPT-3.5, suffers from high pass percentage (PP) to minimize the hallucination rate (HR). When tested with Perplexity.ai (7B), they unexpectedly made more errors; (b) Augmenting relevant metadata lowered the PP and gave the lowest HR; (c) Advance retrieval-augmented generation (RAG) using Mistral demonstrates consistent and robust citation support on indirect queries and matched performance to GPT-3.5 and GPT-4. The HR across all domains and models decreased by an average of 41.93% and the PP was reduced to 0% in most cases. In terms of generation quality, the average F1 Score and BLEU were 68.09% and 57.51%, respectively; (d) Testing with adversarial samples showed that LLMs, including the Advance RAG Mistral, struggle to understand context, but the extent of this issue was small in Mistral and GPT-4-Preview. Our study con tributes valuable insights into the reliability of RAG for automated citation generation tasks.

[324]  arXiv:2405.02229 [pdf, other]
Title: On the Utility of External Agent Intention Predictor for Human-AI Coordination
Subjects: Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)

Reaching a consensus on the team plans is vital to human-AI coordination. Although previous studies provide approaches through communications in various ways, it could still be hard to coordinate when the AI has no explainable plan to communicate. To cover this gap, we suggest incorporating external models to assist humans in understanding the intentions of AI agents. In this paper, we propose a two-stage paradigm that first trains a Theory of Mind (ToM) model from collected offline trajectories of the target agent, and utilizes the model in the process of human-AI collaboration by real-timely displaying the future action predictions of the target agent. Such a paradigm leaves the AI agent as a black box and thus is available for improving any agents. To test our paradigm, we further implement a transformer-based predictor as the ToM model and develop an extended online human-AI collaboration platform for experiments. The comprehensive experimental results verify that human-AI teams can achieve better performance with the help of our model. A user assessment attached to the experiment further demonstrates that our paradigm can significantly enhance the situational awareness of humans. Our study presents the potential to augment the ability of humans via external assistance in human-AI collaboration, which may further inspire future research.

[325]  arXiv:2405.02232 [pdf, ps, other]
Title: From Proof Complexity to Circuit Complexity via Interactive Protocols
Comments: A conference version of this work is accepted to the 51st EATCS International Colloquium on Automata, Languages and Programming (ICALP 2024)
Subjects: Computational Complexity (cs.CC)

Folklore in complexity theory suspects that circuit lower bounds against $\mathbf{NC}^1$ or $\mathbf{P}/\operatorname{poly}$, currently out of reach, are a necessary step towards proving strong proof complexity lower bounds for systems like Frege or Extended Frege. Establishing such a connection formally, however, is already daunting, as it would imply the breakthrough separation $\mathbf{NEXP} \not\subseteq \mathbf{P}/\operatorname{poly}$, as recently observed by Pich and Santhanam (2023).
We show such a connection conditionally for the Implicit Extended Frege proof system ($\mathsf{iEF}$) introduced by Kraj\'i\v{c}ek (The Journal of Symbolic Logic, 2004), capable of formalizing most of contemporary complexity theory. In particular, we show that if $\mathsf{iEF}$ proves efficiently the standard derandomization assumption that a concrete Boolean function is hard on average for subexponential-size circuits, then any superpolynomial lower bound on the length of $\mathsf{iEF}$ proofs implies $\#\mathbf{P} \not\subseteq \mathbf{FP}/\operatorname{poly}$ (which would in turn imply, for example, $\mathbf{PSPACE} \not\subseteq \mathbf{P}/\operatorname{poly}$). Our proof exploits the formalization inside $\mathsf{iEF}$ of the soundness of the sum-check protocol of Lund, Fortnow, Karloff, and Nisan (Journal of the ACM, 1992). This has consequences for the self-provability of circuit upper bounds in $\mathsf{iEF}$. Interestingly, further improving our result seems to require progress in constructing interactive proof systems with more efficient provers.

[326]  arXiv:2405.02235 [pdf, other]
Title: Learning Optimal Deterministic Policies with Stochastic Policy Gradients
Comments: Accepted to ICML 2024
Subjects: Machine Learning (cs.LG)

Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters. Stochastic controllers, however, are often undesirable from a practical perspective because of their lack of robustness, safety, and traceability. In common practice, stochastic (hyper)policies are learned only to deploy their deterministic version. In this paper, we make a step towards the theoretical understanding of this practice. After introducing a novel framework for modeling this scenario, we study the global convergence to the best deterministic policy, under (weak) gradient domination assumptions. Then, we illustrate how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy. Finally, we quantitatively compare action-based and parameter-based exploration, giving a formal guise to intuitive results.

[327]  arXiv:2405.02237 [pdf, other]
Title: A second-order semi-Lagrangian exponential scheme with application to the shallow-water equations on the rotating sphere
Comments: 35 pages, 11 figures
Subjects: Numerical Analysis (math.NA)

In this work, we study and extend a class of semi-Lagrangian exponential methods, which combine exponential time integration techniques, suitable for integrating stiff linear terms, with a semi-Lagrangian treatment of nonlinear advection terms. Partial differential equations involving both processes arise for instance in atmospheric circulation models. Through a truncation error analysis, we first show that previously formulated semi-Lagrangian exponential schemes are limited to first-order accuracy due to the discretization of the linear term; we then formulate a new discretization leading to a second-order accurate method. Also, a detailed stability study, both considering a linear stability analysis and an empirical simulation-based one, is conducted to compare several Eulerian and semi-Lagrangian exponential schemes, as well as a well-established semi-Lagrangian semi-implicit method, which is used in operational atmospheric models. Numerical simulations of the shallow-water equations on the rotating sphere, considering standard and challenging benchmark test cases, are performed to assess the orders of convergence, stability properties, and computational cost of each method. The proposed second-order semi-Lagrangian exponential method was shown to be more stable and accurate than the previously formulated schemes of the same class at the expense of larger wall-clock times; however, the method is more stable and has a similar cost compared to the well-established semi-Lagrangian semi-implicit; therefore, it is a competitive candidate for potential operational applications in atmospheric circulation modeling.

[328]  arXiv:2405.02238 [pdf, other]
Title: Secure and Efficient General Matrix Multiplication On Cloud Using Homomorphic Encryption
Comments: 10 pages, 7 figures. 4 tables
Subjects: Cryptography and Security (cs.CR)

Despite the cloud enormous technical and financial advantages, security and privacy have always been the primary concern for adopting cloud computing facility, especially for government agencies and commercial sectors with high-security requirements. Homomorphic Encryption (HE) has recently emerged as an effective tool in assuring privacy and security for sensitive applications by allowing computing on encrypted data. One major obstacle to employing HE-based computation, however, is its excessive computational cost, which is multiple magnitudes higher than its counterpart based on the plaintext. In this paper, we study the problem of how to reduce the HE-based computational cost for general Matrix Multiplication (MM), i.e., a fundamental building block for numerous practical applications, by taking advantage of the Single Instruction Multiple Data (SIMD) operation supported by HE schemes. Specifically, we develop a novel element-wise algorithm for general matrix multiplication, based on which we propose two HE-based General Matrix Multiplication (HEGMM) algorithms to reduce the HE computation cost. Our experimental results show that our algorithms can significantly outperform the state-of-the-art approaches of HE-based matrix multiplication.

[329]  arXiv:2405.02240 [pdf, other]
Title: Subgraph2vec: A random walk-based algorithm for embedding knowledge graphs
Subjects: Machine Learning (cs.LG)

Graph is an important data representation which occurs naturally in the real world applications \cite{goyal2018graph}. Therefore, analyzing graphs provides users with better insights in different areas such as anomaly detection \cite{ma2021comprehensive}, decision making \cite{fan2023graph}, clustering \cite{tsitsulin2023graph}, classification \cite{wang2021mixup} and etc. However, most of these methods require high levels of computational time and space. We can use other ways like embedding to reduce these costs. Knowledge graph (KG) embedding is a technique that aims to achieve the vector representation of a KG. It represents entities and relations of a KG in a low-dimensional space while maintaining the semantic meanings of them. There are different methods for embedding graphs including random walk-based methods such as node2vec, metapath2vec and regpattern2vec. However, most of these methods bias the walks based on a rigid pattern usually hard-coded in the algorithm. In this work, we introduce \textit{subgraph2vec} for embedding KGs where walks are run inside a user-defined subgraph. We use this embedding for link prediction and prove our method has better performance in most cases in comparison with the previous ones.

[330]  arXiv:2405.02241 [pdf, other]
Title: WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD
Comments: arXiv admin note: text overlap with arXiv:2211.09325
Subjects: Robotics (cs.RO)

We present a novel method for robotic manipulation tasks in human environments that require reasoning about the 3D geometric relationship between a pair of objects. Traditional end-to-end trained policies, which map from pixel observations to low-level robot actions, struggle to reason about complex pose relationships and have difficulty generalizing to unseen object configurations. To address these challenges, we propose a method that learns to reason about the 3D geometric relationship between objects, focusing on the relationship between key parts on one object with respect to key parts on another object. Our standalone model utilizes Weighted SVD to reason about both pose relationships between articulated parts and between free-floating objects. This approach allows the robot to understand the relationship between the oven door and the oven body, as well as the relationship between the lasagna plate and the oven, for example. By considering the 3D geometric relationship between objects, our method enables robots to perform complex manipulation tasks that reason about object-centric representations. We open source the code and demonstrate the results here

[331]  arXiv:2405.02243 [pdf, other]
Title: Towards Improving Learning from Demonstration Algorithms via MCMC Methods
Comments: arXiv admin note: text overlap with arXiv:2207.04638, arXiv:2204.03597 by other authors
Subjects: Robotics (cs.RO)

Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging implicit energy-based policy models. Results suggest that in selected complex robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used neural network-based explicit models, especially in the cases of approximating potentially discontinuous and multimodal functions.

[332]  arXiv:2405.02246 [pdf, other]
Title: What matters when building vision-language models?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. To address this issue, we conduct extensive experiments around pre-trained models, architecture choice, data, and training methods. Our consolidation of findings includes the development of Idefics2, an efficient foundational VLM of 8 billion parameters. Idefics2 achieves state-of-the-art performance within its size category across various multimodal benchmarks, and is often on par with models four times its size. We release the model (base, instructed, and chat) along with the datasets created for its training.

[333]  arXiv:2405.02250 [pdf, other]
Title: Geometric Fabrics: a Safe Guiding Medium for Policy Learning
Subjects: Robotics (cs.RO)

Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.

[334]  arXiv:2405.02256 [pdf, other]
Title: Efficient computation of topological integral transforms
Comments: Accepted in Symposium on Experimental Algorithms (SEA) 2024
Subjects: Computational Geometry (cs.CG)

Topological integral transforms have found many applications in shape analysis, from prediction of clinical outcomes in brain cancer to analysis of barley seeds. Using Euler characteristic as a measure, these objects record rich geometric information on weighted polytopal complexes. While some implementations exist, they only enable discretized representations of the transforms, and they do not handle weighted complexes (such as for instance images). Moreover, recent hybrid transforms lack an implementation.
In this paper, we introduce Eucalc, a novel implementation of three topological integral transforms -- the Euler characteristic transform, the Radon transform, and hybrid transforms -- for weighted cubical complexes. Leveraging piecewise linear Morse theory and Euler calculus, the algorithms significantly reduce computational complexity by focusing on critical points. Our software provides exact representations of transforms, handles both binary and grayscale images, and supports multi-core processing. It is publicly available as a C++ library with a Python wrapper. We present mathematical foundations, implementation details, and experimental evaluations, demonstrating Eucalc's efficiency.

[335]  arXiv:2405.02260 [pdf, other]
Title: Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows
Subjects: Human-Computer Interaction (cs.HC)

Domain experts can play a crucial role in guiding data scientists to optimize machine learning models while ensuring contextual relevance for downstream use. However, in current workflows, such collaboration is challenging due to differing expertise, abstract documentation practices, and lack of access and visibility into low-level implementation artifacts. To address these challenges and enable domain expert participation, we introduce CellSync, a collaboration framework comprising (1) a Jupyter Notebook extension that continuously tracks changes to dataframes and model metrics and (2) a Large Language Model powered visualization dashboard that makes those changes interpretable to domain experts. Through CellSync's cell-level dataset visualization with code summaries, domain experts can interactively examine how individual data and modeling operations impact different data segments. The chat features enable data-centric conversations and targeted feedback to data scientists. Our preliminary evaluation shows that CellSync provides transparency and promotes critical discussions about the intents and implications of data operations.

[336]  arXiv:2405.02261 [pdf, other]
Title: Comparing Personalized Relevance Algorithms for Directed Graphs
Comments: 4 pages, 1 figure. To appear at 2024 IEEE 40th International Conference on Data Engineering (ICDE)
Subjects: Information Retrieval (cs.IR); Computers and Society (cs.CY)

We present an interactive Web platform that, given a directed graph, allows identifying the most relevant nodes related to a given query node. Besides well-established algorithms such as PageRank and Personalized PageRank, the demo includes Cyclerank, a novel algorithm that addresses some of their limitations by leveraging cyclic paths to compute personalized relevance scores. Our demo design enables two use cases: (a) algorithm comparison, comparing the results obtained with different algorithms, and (b) dataset comparison, for exploring and gaining insights into a dataset and comparing it with others. We provide 50 pre-loaded datasets from Wikipedia, Twitter, and Amazon and seven algorithms. Users can upload new datasets, and new algorithms can be easily added. By showcasing efficient algorithms to compute relevance scores in directed graphs, our tool helps to uncover hidden relationships within the data, which makes of it a valuable addition to the repertoire of graph analysis algorithms.

[337]  arXiv:2405.02266 [pdf, other]
Title: On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmented views of a single image to enhance zero-shot generalization, is emerging as a significant area of interest. This has predominantly directed research efforts toward test-time prompt tuning. In contrast, we introduce a robust MeanShift for Test-time Augmentation (MTA), which surpasses prompt-based methods without requiring this intensive training procedure. This positions MTA as an ideal solution for both standalone and API-based applications. Additionally, our method does not rely on ad hoc rules (e.g., confidence threshold) used in some previous test-time augmentation techniques to filter the augmented views. Instead, MTA incorporates a quality assessment variable for each view directly into its optimization process, termed as the inlierness score. This score is jointly optimized with a density mode seeking process, leading to an efficient training- and hyperparameter-free approach. We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency. Deployed easily as plug-and-play module on top of zero-shot models and state-of-the-art few-shot methods, MTA shows systematic and consistent improvements.

[338]  arXiv:2405.02267 [pdf, other]
Title: Structural Pruning of Pre-trained Language Models via Neural Architecture Search
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in real-world applications, due to significant GPU memory requirements and high inference latency. This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency, for example in terms of model size or latency, and generalization performance. We also show how we can utilize more recently developed two-stage weight-sharing NAS approaches in this setting to accelerate the search process. Unlike traditional pruning methods with fixed thresholds, we propose to adopt a multi-objective approach that identifies the Pareto optimal set of sub-networks, allowing for a more flexible and automated compression process.

[339]  arXiv:2405.02280 [pdf, other]
Title: DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a "decompose-then-recompose" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be inferred by re-rendering the background to match the video frames. For the object motion, we first model the object-centric deformation of the objects by leveraging rendering losses and multi-view generative priors in an object-centric frame, then optimize object-centric to world-frame transformations by comparing the rendered outputs against the perceived pixel and optical flow. Finally, we recompose the background and objects and optimize for relative object scales using monocular depth prediction guidance. We show extensive results on the challenging DAVIS, Kubric, and self-captured videos, detail some limitations, and provide future directions. Besides 4D scene generation, our results show that DreamScene4D enables accurate 2D point motion tracking by projecting the inferred 3D trajectories to 2D, while never explicitly trained to do so.

[340]  arXiv:2405.02285 [pdf, ps, other]
Title: Special matrices over finite fields and their applications to quantum error-correcting codes
Authors: Meng Cao
Comments: 11 pages
Subjects: Information Theory (cs.IT)

The matrix-product (MP) code $\mathcal{C}_{A,k}:=[\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{k}]\cdot A$ with a non-singular by column (NSC) matrix $A$ plays an important role in constructing good quantum error-correcting codes. In this paper, we study the MP code when the defining matrix $A$ satisfies the condition that $AA^{\dag}$ is $(D,\tau)$-monomial. We give an explicit formula for calculating the dimension of the Hermitian hull of a MP code. We provide the necessary and sufficient conditions that a MP code is Hermitian dual-containing (HDC), almost Hermitian dual-containing (AHDC), Hermitian self-orthogonal (HSO), almost Hermitian self-orthogonal (AHSO), and Hermitian LCD, respectively. We theoretically determine the number of all possible ways involving the relationships among the constituent codes to yield a MP code with these properties, respectively. We give alternative necessary and sufficient conditions for a MP code to be AHDC and AHSO, respectively, and show several cases where a MP code is not AHDC or AHSO. We provide the construction methods of HDC and AHDC MP codes, including those with optimal minimum distance lower bounds.

[341]  arXiv:2405.02287 [pdf, other]
Title: Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing and probing the capabilities of present frontier models. Notably, our hard set contains >50% questions that all frontier models answer incorrectly. We explore the nuances of designing, evaluating, and ranking models on ultra challenging prompts. We also discuss trade-offs between human and automatic evaluation, and show that automatic model evaluation using Reka Core roughly correlates to human judgment. We offer free API access for the purpose of lightweight evaluation and plan to conduct formal human evaluations for public models that perform well on the Vibe-Eval's automatic scores. We release the evaluation code and data, see https://github.com/reka-ai/reka-vibe-eval

Cross-lists for Mon, 6 May 24

[342]  arXiv:2405.01596 (cross-list from physics.soc-ph) [pdf, ps, other]
Title: Analyzing Player Involvement in the Indian Pro Kabaddi League: A Network Analysis Approach
Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)

This paper aims to apply network analysis to all players who have participated in the Indian Pro Kabaddi League since its inception. The Kabaddi network has been constructed based on the number of teams and players they have played with. The players have been ranked with the help of the degree and PageRank algorithm. Small-world phenomenon is observed in the Kabaddi network. The significance of the player performance has been compared with the player rank received by the network analysis.

[343]  arXiv:2405.01600 (cross-list from eess.IV) [pdf, other]
Title: Deep Learning Descriptor Hybridization with Feature Reduction for Accurate Cervical Cancer Colposcopy Image Classification
Comments: 7 Pages double column, 5 figures, and 5 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cervical cancer stands as a predominant cause of female mortality, underscoring the need for regular screenings to enable early diagnosis and preemptive treatment of pre-cancerous conditions. The transformation zone in the cervix, where cellular differentiation occurs, plays a critical role in the detection of abnormalities. Colposcopy has emerged as a pivotal tool in cervical cancer prevention since it provides a meticulous examination of cervical abnormalities. However, challenges in visual evaluation necessitate the development of Computer Aided Diagnosis (CAD) systems.
We propose a novel CAD system that combines the strengths of various deep-learning descriptors (ResNet50, ResNet101, and ResNet152) with appropriate feature normalization (min-max) as well as feature reduction technique (LDA). The combination of different descriptors ensures that all the features (low-level like edges and colour, high-level like shape and texture) are captured, feature normalization prevents biased learning, and feature reduction avoids overfitting. We do experiments on the IARC dataset provided by WHO. The dataset is initially segmented and balanced. Our approach achieves exceptional performance in the range of 97%-100% for both the normal-abnormal and the type classification. A competitive approach for type classification on the same dataset achieved 81%-91% performance.

[344]  arXiv:2405.01604 (cross-list from q-fin.PM) [pdf, other]
Title: Portfolio Management using Deep Reinforcement Learning
Comments: 7 pages, 9 figures
Subjects: Portfolio Management (q-fin.PM); Machine Learning (cs.LG)

Algorithmic trading or Financial robots have been conquering the stock markets with their ability to fathom complex statistical trading strategies. But with the recent development of deep learning technologies, these strategies are becoming impotent. The DQN and A2C models have previously outperformed eminent humans in game-playing and robotics. In our work, we propose a reinforced portfolio manager offering assistance in the allocation of weights to assets. The environment proffers the manager the freedom to go long and even short on the assets. The weight allocation advisements are restricted to the choice of portfolio assets and tested empirically to knock benchmark indices. The manager performs financial transactions in a postulated liquid market without any transaction charges. This work provides the conclusion that the proposed portfolio manager with actions centered on weight allocations can surpass the risk-adjusted returns of conventional portfolio managers.

[345]  arXiv:2405.01606 (cross-list from quant-ph) [pdf, other]
Title: Improving Trainability of Variational Quantum Circuits via Regularization Strategies
Comments: preprint, under review. TL;DR: we propose a regularization strategy to improve the trainability of VQCs
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

In the era of noisy intermediate-scale quantum (NISQ), variational quantum circuits (VQCs) have been widely applied in various domains, advancing the superiority of quantum circuits against classic models. Similar to classic models, regular VQCs can be optimized by various gradient-based methods. However, the optimization may be initially trapped in barren plateaus or eventually entangled in saddle points during training. These gradient issues can significantly undermine the trainability of VQC. In this work, we propose a strategy that regularizes model parameters with prior knowledge of the train data and Gaussian noise diffusion. We conduct ablation studies to verify the effectiveness of our strategy across four public datasets and demonstrate that our method can improve the trainability of VQCs against the above-mentioned gradient issues.

[346]  arXiv:2405.01616 (cross-list from q-bio.BM) [pdf, other]
Title: Generative Active Learning for the Search of Small-molecule Protein Binders
Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.

[347]  arXiv:2405.01644 (cross-list from eess.IV) [pdf, ps, other]
Title: A Classification-Based Adaptive Segmentation Pipeline: Feasibility Study Using Polycystic Liver Disease and Metastases from Colorectal Cancer CT Images
Comments: J Digit Imaging. Inform. med. (2024)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

Automated segmentation tools often encounter accuracy and adaptability issues when applied to images of different pathology. The purpose of this study is to explore the feasibility of building a workflow to efficiently route images to specifically trained segmentation models. By implementing a deep learning classifier to automatically classify the images and route them to appropriate segmentation models, we hope that our workflow can segment the images with different pathology accurately. The data we used in this study are 350 CT images from patients affected by polycystic liver disease and 350 CT images from patients presenting with liver metastases from colorectal cancer. All images had the liver manually segmented by trained imaging analysts. Our proposed adaptive segmentation workflow achieved a statistically significant improvement for the task of total liver segmentation compared to the generic single segmentation model (non-parametric Wilcoxon signed rank test, n=100, p-value << 0.001). This approach is applicable in a wide range of scenarios and should prove useful in clinical implementations of segmentation pipelines.

[348]  arXiv:2405.01658 (cross-list from eess.IV) [pdf, other]
Title: MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems
Comments: Accepted in DCA in MI Workshop@CVPR2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The acquisition of different data modalities can enhance our knowledge and understanding of various diseases, paving the way for a more personalized healthcare. Thus, medicine is progressively moving towards the generation of massive amounts of multi-modal data (\emph{e.g,} molecular, radiology, and histopathology). While this may seem like an ideal environment to capitalize data-centric machine learning approaches, most methods still focus on exploring a single or a pair of modalities due to a variety of reasons: i) lack of ready to use curated datasets; ii) difficulty in identifying the best multi-modal fusion strategy; and iii) missing modalities across patients. In this paper we introduce a real world multi-modal dataset called MMIST-CCRCC that comprises 2 radiology modalities (CT and MRI), histopathology, genomics, and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC). We provide single and multi-modal (early and late fusion) benchmarks in the task of 12-month survival prediction in the challenging scenario of one or more missing modalities for each patient, with missing rates that range from 26$\%$ for genomics data to more than 90$\%$ for MRI. We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting. Additionally, incorporating a strategy to generate the latent representations of the missing modalities given the available ones further improves the performance, highlighting a potential complementarity across modalities. Our dataset and code are available here: https://multi-modal-ist.github.io/datasets/ccRCC

[349]  arXiv:2405.01715 (cross-list from q-bio.GN) [pdf, other]
Title: Identification of SNPs in genomes using GRAMEP, an alignment-free method based on the Principle of Maximum Entropy
Subjects: Genomics (q-bio.GN); Information Theory (cs.IT); Applications (stat.AP)

Advances in high throughput sequencing technologies provide a large number of genomes to be analyzed, so computational methodologies play a crucial role in analyzing and extracting knowledge from the data generated. Investigating genomic mutations is critical because of their impact on chromosomal evolution, genetic disorders, and diseases. It is common to adopt aligning sequences for analyzing genomic variations, however, this approach can be computationally expensive and potentially arbitrary in scenarios with large datasets. Here, we present a novel method for identifying single nucleotide polymorphisms (SNPs) in DNA sequences from assembled genomes. This method uses the principle of maximum entropy to select the most informative k-mers specific to the variant under investigation. The use of this informative k-mer set enables the detection of variant-specific mutations in comparison to a reference sequence. In addition, our method offers the possibility of classifying novel sequences with no need for organism-specific information. GRAMEP demonstrated high accuracy in both in silico simulations and analyses of real viral genomes, including Dengue, HIV, and SARS-CoV-2. Our approach maintained accurate SARS-CoV-2 variant identification while demonstrating a lower computational cost compared to the gold-standard statistical tools. The source code for this proof-of-concept implementation is freely available at https://github.com/omatheuspimenta/GRAMEP.

[350]  arXiv:2405.01725 (cross-list from eess.IV) [pdf, other]
Title: Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning has made significant progress in computer vision, specifically in image classification, object detection, and semantic segmentation. The skip connection has played an essential role in the architecture of deep neural networks,enabling easier optimization through residual learning during the training stage and improving accuracy during testing. Many neural networks have inherited the idea of residual learning with skip connections for various tasks, and it has been the standard choice for designing neural networks. This survey provides a comprehensive summary and outlook on the development of skip connections in deep neural networks. The short history of skip connections is outlined, and the development of residual learning in deep neural networks is surveyed. The effectiveness of skip connections in the training and testing stages is summarized, and future directions for using skip connections in residual learning are discussed. Finally, we summarize seminal papers, source code, models, and datasets that utilize skip connections in computer vision, including image classification, object detection, semantic segmentation, and image reconstruction. We hope this survey could inspire peer researchers in the community to develop further skip connections in various forms and tasks and the theory of residual learning in deep neural networks. The project page can be found at https://github.com/apple1986/Residual_Learning_For_Images

[351]  arXiv:2405.01726 (cross-list from eess.IV) [pdf, ps, other]
Title: SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Denoising hyperspectral images (HSIs) is a crucial preprocessing procedure due to the noise originating from intra-imaging mechanisms and environmental factors. Utilizing domain-specific knowledge of HSIs, such as spectral correlation, spatial self-similarity, and spatial-spectral correlation, is essential for deep learning-based denoising. Existing methods are often constrained by running time, space complexity, and computational complexity, employing strategies that explore these priors separately. While the strategies can avoid some redundant information, considering that hyperspectral images are 3-D images with strong spatial continuity and spectral correlation, this kind of strategy inevitably overlooks subtle long-range spatial-spectral information that positively impacts image restoration. This paper proposes a Spatial-Spectral Selective State Space Model-based U-shaped network, termed Spatial-Spectral U-Mamba (SSUMamba), for hyperspectral image denoising. We can obtain complete global spatial-spectral correlation within a module thanks to the linear space complexity in State Space Model (SSM) computations. We introduce an Alternating Scan (SSAS) strategy for HSI data, which helps model the information flow in multiple directions in 3-D HSIs. Experimental results demonstrate that our method outperforms several compared methods. The source code will be available at https://github.com/lronkitty/SSUMamba.

[352]  arXiv:2405.01730 (cross-list from eess.AS) [pdf, other]
Title: Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Comments: Accepted by Speaker Odyssey 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Expressive voice conversion (VC) conducts speaker identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Emotional style modeling for arbitrary speakers in expressive VC has not been extensively explored. Previous approaches have relied on vocoders for speech reconstruction, which makes speech quality heavily dependent on the performance of vocoders. A major challenge of expressive VC lies in emotion prosody modeling. To address these challenges, this paper proposes a fully end-to-end expressive VC framework based on a conditional denoising diffusion probabilistic model (DDPM). We utilize speech units derived from self-supervised speech models as content conditioning, along with deep features extracted from speech emotion recognition and speaker verification systems to model emotional style and speaker identity. Objective and subjective evaluations show the effectiveness of our framework. Codes and samples are publicly available.

[353]  arXiv:2405.01737 (cross-list from stat.ML) [pdf, other]
Title: Sample-efficient neural likelihood-free Bayesian inference of implicit HMMs
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)

Likelihood-free inference methods based on neural conditional density estimation were shown to drastically reduce the simulation burden in comparison to classical methods such as ABC. When applied in the context of any latent variable model, such as a Hidden Markov model (HMM), these methods are designed to only estimate the parameters, rather than the joint distribution of the parameters and the hidden states. Naive application of these methods to a HMM, ignoring the inference of this joint posterior distribution, will thus produce an inaccurate estimate of the posterior predictive distribution, in turn hampering the assessment of goodness-of-fit. To rectify this problem, we propose a novel, sample-efficient likelihood-free method for estimating the high-dimensional hidden states of an implicit HMM. Our approach relies on learning directly the intractable posterior distribution of the hidden states, using an autoregressive-flow, by exploiting the Markov property. Upon evaluating our approach on some implicit HMMs, we found that the quality of the estimates retrieved using our method is comparable to what can be achieved using a much more computationally expensive SMC algorithm.

[354]  arXiv:2405.01750 (cross-list from eess.IV) [pdf, other]
Title: PointCompress3D -- A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In the context of Intelligent Transportation Systems (ITS), efficient data compression is crucial for managing large-scale point cloud data acquired by roadside LiDAR sensors. The demand for efficient storage, streaming, and real-time object detection capabilities for point cloud data is substantial. This work introduces PointCompress3D, a novel point cloud compression framework tailored specifically for roadside LiDARs. Our framework addresses the challenges of compressing high-resolution point clouds while maintaining accuracy and compatibility with roadside LiDAR sensors. We adapt, extend, integrate, and evaluate three cutting-edge compression methods using our real-world-based TUMTraf dataset family. We achieve a frame rate of 10 FPS while keeping compression sizes below 105 Kb, a reduction of 50 times, and maintaining object detection performance on par with the original data. In extensive experiments and ablation studies, we finally achieved a PSNR d2 of 94.46 and a BPP of 6.54 on our dataset. Future work includes the deployment on the live system. The code is available on our project website: https://pointcompress3d.github.io.

[355]  arXiv:2405.01756 (cross-list from physics.med-ph) [pdf, other]
Title: Segmentation-Free Outcome Prediction in Head and Neck Cancer: Deep Learning-based Feature Extraction from Multi-Angle Maximum Intensity Projections (MA-MIPs) of PET Images
Comments: 15 pages, 4 tables, 4 figures. Submitted for European Journal of Nuclear Medicine and Medical Imaging
Subjects: Medical Physics (physics.med-ph); Artificial Intelligence (cs.AI)

We introduce an innovative, simple, effective segmentation-free approach for outcome prediction in head \& neck cancer (HNC) patients. By harnessing deep learning-based feature extraction techniques and multi-angle maximum intensity projections (MA-MIPs) applied to Fluorodeoxyglucose Positron Emission Tomography (FDG-PET) volumes, our proposed method eliminates the need for manual segmentations of regions-of-interest (ROIs) such as primary tumors and involved lymph nodes. Instead, a state-of-the-art object detection model is trained to perform automatic cropping of the head and neck region on the PET volumes. A pre-trained deep convolutional neural network backbone is then utilized to extract deep features from MA-MIPs obtained from 72 multi-angel axial rotations of the cropped PET volumes. These deep features extracted from multiple projection views of the PET volumes are then aggregated and fused, and employed to perform recurrence-free survival analysis on a cohort of 489 HNC patients. The proposed approach outperforms the best performing method on the target dataset for the task of recurrence-free survival analysis. By circumventing the manual delineation of the malignancies on the FDG PET-CT images, our approach eliminates the dependency on subjective interpretations and highly enhances the reproducibility of the proposed survival analysis method.

[356]  arXiv:2405.01761 (cross-list from stat.ML) [pdf, other]
Title: Multivariate Bayesian Last Layer for Regression: Uncertainty Quantification and Disentanglement
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We present new Bayesian Last Layer models in the setting of multivariate regression under heteroscedastic noise, and propose an optimization algorithm for parameter learning. Bayesian Last Layer combines Bayesian modelling of the predictive distribution with neural networks for parameterization of the prior, and has the attractive property of uncertainty quantification with a single forward pass. The proposed framework is capable of disentangling the aleatoric and epistemic uncertainty, and can be used to transfer a canonically trained deep neural network to new data domains with uncertainty-aware capability.

[357]  arXiv:2405.01770 (cross-list from math.OC) [pdf, other]
Title: Bike network planning in limited urban space
Subjects: Optimization and Control (math.OC); Computational Engineering, Finance, and Science (cs.CE)

The lack of cycling infrastructure in urban environments hinders the adoption of cycling as a viable mode for commuting, despite the evident benefits of (e-)bikes as sustainable, efficient, and health-promoting transportation modes. Bike network planning is a tedious process, relying on heuristic computational methods that frequently overlook the broader implications of introducing new cycling infrastructure, in particular the necessity to repurpose car lanes. In this work, we call for optimizing the trade-off between bike and car networks, effectively pushing for Pareto optimality. This shift in perspective gives rise to a novel linear programming formulation towards optimal bike network allocation. Our experiments, conducted using both real-world and synthetic data, testify the effectiveness and superiority of this optimization approach compared to heuristic methods. In particular, the framework provides stakeholders with a range of lane reallocation scenarios, illustrating potential bike network enhancements and their implications for car infrastructure. Crucially, our approach is adaptable to various bikeability and car accessibility evaluation criteria, making our tool a highly flexible and scalable resource for urban planning. This paper presents an advanced decision-support framework that can significantly aid urban planners in making informed decisions on cycling infrastructure development.

[358]  arXiv:2405.01822 (cross-list from eess.IV) [pdf, other]
Title: Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. The goal of this challenge was to promote the development of deep generative models (DGMs) for medical imaging and to emphasize the need for their domain-relevant assessment via the analysis of relevant image statistics. As part of this Grand Challenge, a training dataset was developed based on 3D anthropomorphic breast phantoms from the VICTRE virtual imaging toolbox. A two-stage evaluation procedure consisting of a preliminary check for memorization and image quality (based on the Frechet Inception distance (FID)), and a second stage evaluating the reproducibility of image statistics corresponding to domain-relevant radiomic features was developed. A summary measure was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, and to identify various artifacts. 58 submissions from 12 unique users were received for this Challenge. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. We observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.

[359]  arXiv:2405.01834 (cross-list from physics.comp-ph) [pdf, other]
Title: 3-center and 4-center 2-particle Gaussian AO integrals on modern accelerated processors
Subjects: Computational Physics (physics.comp-ph); Materials Science (cond-mat.mtrl-sci); Distributed, Parallel, and Cluster Computing (cs.DC); Chemical Physics (physics.chem-ph)

We report an implementation of the McMurchie-Davidson (MD) algorithm for 3-center and 4-center 2-particle integrals over Gaussian atomic orbitals (AOs) with low and high angular momenta $l$ and varying degrees of contraction for graphical processing units (GPUs). This work builds upon our recent implementation of a matrix form of the MD algorithm that is efficient for GPU evaluation of 4-center 2-particle integrals over Gaussian AOs of high angular momenta ($l\geq 4$) [$\mathit{J. Phys. Chem. A}\ \mathbf{127}$, 10889 (2023)]. The use of unconventional data layouts and three variants of the MD algorithm allow to evaluate integrals in double precision with sustained performance between 25% and 70% of the theoretical hardware peak. Performance assessment includes integrals over AOs with $l\leq 6$ (higher $l$ is supported). Preliminary implementation of the Hartree-Fock exchange operator is presented and assessed for computations with up to quadruple-zeta basis and more than 20,000 AOs. The corresponding C++ code is a part of the experimental open-source $\mathtt{LibintX}$ library available at $\mathbf{github.com:ValeevGroup/LibintX}$.

[360]  arXiv:2405.01879 (cross-list from math.CO) [pdf, other]
Title: Unavoidable induced subgraphs in graphs with complete bipartite induced minors
Comments: 25 pages, 12 figures
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

We prove that if a graph contains the complete bipartite graph $K_{134, 12}$ as an induced minor, then it contains a cycle of length at most~12 or a theta as an induced subgraph. With a longer and more technical proof, we prove that if a graph contains $K_{3, 4}$ as an induced minor, then it contains a triangle or a theta as an induced subgraph. Here, a \emph{theta} is a graph made of three internally vertex-disjoint chordless paths $P_1 = a \dots b$, $P_2 = a \dots b$, $P_3 = a \dots b$, each of length at least two, such that no edges exist between the paths except the three edges incident to $a$ and the three edges incident to $b$.
A consequence is that excluding a grid and a complete bipartite graph as induced minors is not enough to guarantee a bounded tree-independence number, or even that the treewidth is bounded by a function of the size of the maximum clique, because the existence of graphs with large treewidth that contain no triangles or thetas as induced subgraphs is already known (the so-called layered wheels).

[361]  arXiv:2405.01881 (cross-list from q-fin.RM) [pdf, ps, other]
Title: Explainable Risk Classification in Financial Reports
Subjects: Risk Management (q-fin.RM); Machine Learning (cs.LG)

Every publicly traded company in the US is required to file an annual 10-K financial report, which contains a wealth of information about the company. In this paper, we propose an explainable deep-learning model, called FinBERT-XRC, that takes a 10-K report as input, and automatically assesses the post-event return volatility risk of its associated company. In contrast to previous systems, our proposed model simultaneously offers explanations of its classification decision at three different levels: the word, sentence, and corpus levels. By doing so, our model provides a comprehensive interpretation of its prediction to end users. This is particularly important in financial domains, where the transparency and accountability of algorithmic predictions play a vital role in their application to decision-making processes. Aside from its novel interpretability, our model surpasses the state of the art in predictive accuracy in experiments on a large real-world dataset of 10-K reports spanning six years.

[362]  arXiv:2405.01944 (cross-list from math.CO) [pdf, other]
Title: Topologically Interlocking Blocks inside the Tetroctahedrille
Comments: Keywords: Topological interlocking, Space Fillings, Triangulations, Origami, Approximations
Subjects: Combinatorics (math.CO); Computational Geometry (cs.CG)

A topological interlocking assembly consists of rigid blocks together with a fixed frame, such that any subset of blocks is kinematically constrained and therefore cannot be removed from the assembly. In this paper we pursue a modular approach to construct (non-convex) interlocking blocks by combining finitely many tetrahedra and octahedra. This gives rise to polyhedra whose vertices can be described by the tetrahedral-octahedral honeycomb, also known as tetroctahedrille. We show that the resulting interlocking blocks are very versatile and allow many possibilities to form topological interlocking assemblies consisting of copies of a single block. We formulate a generalised construction of some of the introduced blocks to construct families of topological interlocking blocks. Moreover, we demonstrate a geometric application by using the tetroctahedrille to approximate given geometric objects. Finally, we show that given topological interlocking assemblies can be deformed continuously in order to obtain new topological interlocking assemblies.

[363]  arXiv:2405.01952 (cross-list from stat.ML) [pdf, other]
Title: Three Quantization Regimes for ReLU Networks
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)

We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. Specifically, three regimes, namely under-, over-, and proper quantization, in terms of minimax approximation error behavior as a function of network weight precision, are identified. This is accomplished by deriving nonasymptotic tight lower and upper bounds on the minimax approximation error. Notably, in the proper-quantization regime, neural networks exhibit memory-optimality in the approximation of Lipschitz functions. Deep networks have an inherent advantage over shallow networks in achieving memory-optimality. We also develop the notion of depth-precision tradeoff, showing that networks with high-precision weights can be converted into functionally equivalent deeper networks with low-precision weights, while preserving memory-optimality. This idea is reminiscent of sigma-delta analog-to-digital conversion, where oversampling rate is traded for resolution in the quantization of signal samples. We improve upon the best-known ReLU network approximation results for Lipschitz functions and describe a refinement of the bit extraction technique which could be of independent general interest.

[364]  arXiv:2405.01953 (cross-list from math.NT) [pdf, ps, other]
Title: Mahler equations for Zeckendorf numeration
Comments: 33 pages, 6 figures
Subjects: Number Theory (math.NT); Formal Languages and Automata Theory (cs.FL)

We define generalised equations of Z-Mahler type, based on the Zeckendorf numeration system. We show that if a sequence over a commutative ring is Z-regular, then it is the sequence of coefficients of a series which is a solution of a Z-Mahler equation. Conversely, if the Z-Mahler equation is isolating, then its solutions define Z-regular sequences. This is a generalisation of results of Becker and Dumas. We provide an example to show that there exist non-isolating Z-Mahler equations whose solutions do not define Z-regular sequences. Our proof yields a new construction of weighted automata that generate classical q-regular sequences.

[365]  arXiv:2405.01964 (cross-list from stat.ML) [pdf, other]
Title: Understanding LLMs Requires More Than Statistical Generalization
Comments: Accepted at ICML2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

[366]  arXiv:2405.01967 (cross-list from eess.AS) [pdf, other]
Title: Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorithms in two complex acoustic scenes. Both algorithms are evaluated with objective metrics and in experiments with hearing-impaired listeners performing a speech-in-noise test. Results are compared to two traditional enhancement strategies, i.e., adaptive differential microphone processing and binaural beamforming. While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers. Through a post-analysis, this can be attributed to improvements at low SNRs and to precise spatial filtering.

[367]  arXiv:2405.01984 (cross-list from math.OC) [pdf, other]
Title: A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)

Traditional mathematical programming solvers require long computational times to solve constrained minimization problems of complex and large-scale physical systems. Therefore, these problems are often transformed into unconstrained ones, and solved with computationally efficient optimization approaches based on first-order information, such as the gradient descent method. However, for unconstrained problems, balancing the minimization of the objective function with the reduction of constraint violations is challenging. We consider the class of time-dependent minimization problems with increasing (possibly) nonlinear and non-convex objective function and non-decreasing (possibly) nonlinear and non-convex inequality constraints. To efficiently solve them, we propose a penalty-based guardrail algorithm (PGA). This algorithm adapts a standard penalty-based method by dynamically updating the right-hand side of the constraints with a guardrail variable which adds a margin to prevent violations. We evaluate PGA on two novel application domains: a simplified model of a district heating system and an optimization model derived from learned deep neural networks. Our method significantly outperforms mathematical programming solvers and the standard penalty-based method, and achieves better performance and faster convergence than a state-of-the-art algorithm (IPDD) within a specified time limit.

[368]  arXiv:2405.01994 (cross-list from stat.ML) [pdf, ps, other]
Title: Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery
Authors: Patrick Saux
Comments: Doctoral thesis. Some pdf readers (e.g. Firefox) have trouble rendering the theorems/definitions environment. When reading online, please prefer e.g. Chrome
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

This thesis aims to study some of the mathematical challenges that arise in the analysis of statistical sequential decision-making algorithms for postoperative patients follow-up. Stochastic bandits (multiarmed, contextual) model the learning of a sequence of actions (policy) by an agent in an uncertain environment in order to maximise observed rewards. To learn optimal policies, bandit algorithms have to balance the exploitation of current knowledge and the exploration of uncertain actions. Such algorithms have largely been studied and deployed in industrial applications with large datasets, low-risk decisions and clear modelling assumptions, such as clickthrough rate maximisation in online advertising. By contrast, digital health recommendations call for a whole new paradigm of small samples, risk-averse agents and complex, nonparametric modelling. To this end, we developed new safe, anytime-valid concentration bounds, (Bregman, empirical Chernoff), introduced a new framework for risk-aware contextual bandits (with elicitable risk measures) and analysed a novel class of nonparametric bandit algorithms under weak assumptions (Dirichlet sampling). In addition to the theoretical guarantees, these results are supported by in-depth empirical evidence. Finally, as a first step towards personalised postoperative follow-up recommendations, we developed with medical doctors and surgeons an interpretable machine learning model to predict the long-term weight trajectories of patients after bariatric surgery.

[369]  arXiv:2405.02034 (cross-list from math.OC) [pdf, other]
Title: Multi-Agent Coverage Control on Surfaces Using Conformal Mapping
Authors: Chao Zhai, Yuming Wu
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

Real-time environmental monitoring using a multi-agent system (MAS) has long been a focal point of cooperative control. It is still a challenging task to provide cost-effective services for potential emergencies in surface environments. This paper explores the transformation of a general surface into a two-dimensional (2D) disk through the construction of a conformal mapping. Multiple agents are strategically deployed within the mapped convex disk, followed by mapping back to the original surface environment. This approach circumvents the complexities associated with handling the difficulties and intricacies of path planning. Technical analysis encompasses the design of distributed control laws and the method to eliminate distortions introduced by the mapping. Moreover, the developed coverage algorithm is applied to a scenario of monitoring surface deformation. Finally, the effectiveness of the proposed algorithm is validated through numerical simulations.

[370]  arXiv:2405.02082 (cross-list from stat.ML) [pdf, ps, other]
Title: A comparative study of conformal prediction methods for valid uncertainty quantification in machine learning
Authors: Nicolas Dewolf
Comments: At 339 pages, this document is a live/working version of my PhD dissertation published in 2024 by the University of Ghent (UGent)
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)

In the past decades, most work in the area of data analysis and machine learning was focused on optimizing predictive models and getting better results than what was possible with existing models. To what extent the metrics with which such improvements were measured were accurately capturing the intended goal, whether the numerical differences in the resulting values were significant, or whether uncertainty played a role in this study and if it should have been taken into account, was of secondary importance. Whereas probability theory, be it frequentist or Bayesian, used to be the gold standard in science before the advent of the supercomputer, it was quickly replaced in favor of black box models and sheer computing power because of their ability to handle large data sets. This evolution sadly happened at the expense of interpretability and trustworthiness. However, while people are still trying to improve the predictive power of their models, the community is starting to realize that for many applications it is not so much the exact prediction that is of importance, but rather the variability or uncertainty.
The work in this dissertation tries to further the quest for a world where everyone is aware of uncertainty, of how important it is and how to embrace it instead of fearing it. A specific, though general, framework that allows anyone to obtain accurate uncertainty estimates is singled out and analysed. Certain aspects and applications of the framework -- dubbed `conformal prediction' -- are studied in detail. Whereas many approaches to uncertainty quantification make strong assumptions about the data, conformal prediction is, at the time of writing, the only framework that deserves the title `distribution-free'. No parametric assumptions have to be made and the nonparametric results also hold without having to resort to the law of large numbers in the asymptotic regime.

[371]  arXiv:2405.02101 (cross-list from eess.SP) [pdf, other]
Title: Discrete Aware Matrix Completion via Convexized $\ell_0$-Norm Approximation
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

We consider a novel algorithm, for the completion of partially observed low-rank matrices in a structured setting where each entry can be chosen from a finite discrete alphabet set, such as in common recommender systems. The proposed low-rank matrix completion (MC) method is an improved variation of state-of-the-art (SotA) discrete aware matrix completion method which we previously proposed, in which discreteness is enforced by an $\ell_0$-norm regularizer, not by replaced with the $\ell_1$-norm, but instead approximated by a continuous and differentiable function normalized via fractional programming (FP) under a proximal gradient (PG) framework. Simulation results demonstrate the superior performance of the new method compared to the SotA techniques as well as the earlier $\ell_1$-norm-based discrete-aware matrix completion approach.

[372]  arXiv:2405.02109 (cross-list from eess.IV) [pdf, ps, other]
Title: Three-Dimensional Amyloid-Beta PET Synthesis from Structural MRI with Conditional Generative Adversarial Networks
Comments: Abstract Submitted and Presented at the 2024 International Society of Magnetic Resonance in Medicine. Singapore, Singapore, May 4-9. Abstract Number 2239
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Motivation: Alzheimer's Disease hallmarks include amyloid-beta deposits and brain atrophy, detectable via PET and MRI scans, respectively. PET is expensive, invasive and exposes patients to ionizing radiation. MRI is cheaper, non-invasive, and free from ionizing radiation but limited to measuring brain atrophy.
Goal: To develop an 3D image translation model that synthesizes amyloid-beta PET images from T1-weighted MRI, exploiting the known relationship between amyloid-beta and brain atrophy.
Approach: The model was trained on 616 PET/MRI pairs and validated with 264 pairs.
Results: The model synthesized amyloid-beta PET images from T1-weighted MRI with high-degree of similarity showing high SSIM and PSNR metrics (SSIM>0.95&PSNR=28).
Impact: Our model proves the feasibility of synthesizing amyloid-beta PET images from structural MRI ones, significantly enhancing accessibility for large-cohort studies and early dementia detection, while also reducing cost, invasiveness, and radiation exposure.

[373]  arXiv:2405.02124 (cross-list from eess.AS) [pdf, other]
Title: TIPAA-SSL: Text Independent Phone-to-Audio Alignment based on Self-Supervised Learning and Knowledge Transfer
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

In this paper, we present a novel approach for text independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (wav2vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained thanks to forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages.

[374]  arXiv:2405.02131 (cross-list from eess.SP) [pdf, other]
Title: Physics-informed generative neural networks for RF propagation prediction with application to indoor body perception
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations.

[375]  arXiv:2405.02188 (cross-list from stat.ML) [pdf, other]
Title: Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not adversarial. To address this, we introduce and study a new variant of AMDP, which aims to minimize regret while utilizing a set of cost predictors. For this setting, we develop a new policy search method that achieves a sublinear optimistic regret with high probability, that is a regret bound which gracefully degrades with the estimation power of the cost predictors. Establishing such optimistic regret bounds is nontrivial given that (i) as we demonstrate, the existing importance-weighted cost estimators cannot establish optimistic bounds, and (ii) the feedback model of AMDP is different (and more realistic) than the existing optimistic online learning works. Our result, in particular, hinges upon developing a novel optimistically biased cost estimator that leverages cost predictors and enables a high-probability regret analysis without imposing restrictive assumptions. We further discuss practical extensions of the proposed scheme and demonstrate its efficacy numerically.

[376]  arXiv:2405.02201 (cross-list from math.OC) [pdf, other]
Title: Regularized Q-learning through Robust Averaging
Comments: 26 pages, 5 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins' Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.

[377]  arXiv:2405.02208 (cross-list from eess.IV) [pdf, other]
Title: Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Image Quality Assessment (IQA) is essential in various Computer Vision tasks such as image deblurring and super-resolution. However, most IQA methods require reference images, which are not always available. While there are some reference-free IQA metrics, they have limitations in simulating human perception and discerning subtle image quality variations. We hypothesize that the JPEG quality factor is representatives of image quality measurement, and a well-trained neural network can learn to accurately evaluate image quality without requiring a clean reference, as it can recognize image degradation artifacts based on prior knowledge. Thus, we developed a reference-free quality evaluation network, dubbed "Quality Factor (QF) Predictor", which does not require any reference. Our QF Predictor is a lightweight, fully convolutional network comprising seven layers. The model is trained in a self-supervised manner: it receives JPEG compressed image patch with a random QF as input, is trained to accurately predict the corresponding QF. We demonstrate the versatility of the model by applying it to various tasks. First, our QF Predictor can generalize to measure the severity of various image artifacts, such as Gaussian Blur and Gaussian noise. Second, we show that the QF Predictor can be trained to predict the undersampling rate of images reconstructed from Magnetic Resonance Imaging (MRI) data.

[378]  arXiv:2405.02225 (cross-list from stat.ML) [pdf, other]
Title: Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks
Comments: 28 pages, 8 figures, accepted by ICML2024
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Methodology (stat.ME)

This paper introduces a framework for post-processing machine learning models so that their predictions satisfy multi-group fairness guarantees. Based on the celebrated notion of multicalibration, we introduce $(\mathbf{s},\mathcal{G}, \alpha)-$GMC (Generalized Multi-Dimensional Multicalibration) for multi-dimensional mappings $\mathbf{s}$, constraint set $\mathcal{G}$, and a pre-specified threshold level $\alpha$. We propose associated algorithms to achieve this notion in general settings. This framework is then applied to diverse scenarios encompassing different fairness concerns, including false negative rate control in image segmentation, prediction set conditional uncertainty quantification in hierarchical classification, and de-biased text generation in language models. We conduct numerical studies on several datasets and tasks.

[379]  arXiv:2405.02231 (cross-list from stat.ME) [pdf, other]
Title: Efficient spline orthogonal basis for representation of density functions
Subjects: Methodology (stat.ME); Numerical Analysis (math.NA)

Probability density functions form a specific class of functional data objects with intrinsic properties of scale invariance and relative scale characterized by the unit integral constraint. The Bayes spaces methodology respects their specific nature, and the centred log-ratio transformation enables processing such functional data in the standard Lebesgue space of square-integrable functions. As the data representing densities are frequently observed in their discrete form, the focus has been on their spline representation. Therefore, the crucial step in the approximation is to construct a proper spline basis reflecting their specific properties. Since the centred log-ratio transformation forms a subspace of functions with a zero integral constraint, the standard $B$-spline basis is no longer suitable. Recently, a new spline basis incorporating this zero integral property, called $Z\!B$-splines, was developed. However, this basis does not possess the orthogonal property which is beneficial from computational and application point of view. As a result of this paper, we describe an efficient method for constructing an orthogonal $Z\!B$-splines basis, called $Z\!B$-splinets. The advantages of the $Z\!B$-splinet approach are foremost a computational efficiency and locality of basis supports that is desirable for data interpretability, e.g. in the context of functional principal component analysis. The proposed approach is demonstrated on an empirical demographic dataset.

[380]  arXiv:2405.02268 (cross-list from math.DG) [pdf, ps, other]
Title: The injectivity radius of the compact Stiefel manifold under the Euclidean metric
Comments: 10 pages
Subjects: Differential Geometry (math.DG); Numerical Analysis (math.NA)

The injectivity radius of a manifold is an important quantity, both from a theoretical point of view and in terms of numerical applications. It is the largest possible radius within which all geodesics are unique and length-minimizing. In consequence, it is the largest possible radius within which calculations in Riemannian normal coordinates are well-defined. A matrix manifold that arises frequently in a wide range of practical applications is the compact Stiefel manifold of orthogonal $p$-frames in $\mathbb{R}^n$. We observe that geodesics on this manifold are space curves of constant Frenet curvatures. Using this fact, we prove that the injectivity radius on the Stiefel manifold under the Euclidean metric is $\pi$.

Replacements for Mon, 6 May 24

[381]  arXiv:1511.04143 (replaced) [pdf, other]
Title: Deep Reinforcement Learning in Parameterized Action Space
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE)
[382]  arXiv:2102.10156 (replaced) [pdf, other]
Title: Learning to Persuade on the Fly: Robustness Against Ignorance
Comments: Accepted at Operations Research. Preliminary version appeared as an extended abstract in EC 2021
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Theoretical Economics (econ.TH)
[383]  arXiv:2104.02438 (replaced) [pdf, other]
Title: Orbit-Finite-Dimensional Vector Spaces and Weighted Register Automata
Comments: Journal Version for TheoretiCS
Journal-ref: TheoretiCS, Volume 3 (2024), Article 13, 1-41
Subjects: Formal Languages and Automata Theory (cs.FL)
[384]  arXiv:2107.12416 (replaced) [pdf, other]
Title: Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent
Comments: The arxiv version contains proofs of Lemma 3 and Lemma 5, which are missing in the published version
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
[385]  arXiv:2109.07319 (replaced) [pdf, other]
Title: InceptionXML: A Lightweight Framework with Synchronized Negative Sampling for Short Text Extreme Classification
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[386]  arXiv:2110.00744 (replaced) [pdf, ps, other]
Title: Random Subgraph Detection Using Queries
Comments: 27 pages
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)
[387]  arXiv:2111.05726 (replaced) [pdf, other]
Title: The Cut-and-Play Algorithm: Computing Nash Equilibria via Outer Approximations
Subjects: Optimization and Control (math.OC); Computer Science and Game Theory (cs.GT)
[388]  arXiv:2111.15000 (replaced) [pdf, other]
Title: Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes
Comments: This was published in CVPR 2022
Journal-ref: 2022 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[389]  arXiv:2201.01288 (replaced) [pdf, other]
Title: Automated Graph Machine Learning: Approaches, Libraries, Benchmarks and Directions
Comments: 20 pages, 4 figures. arXiv admin note: text overlap with arXiv:2103.00742
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[390]  arXiv:2201.02401 (replaced) [pdf, ps, other]
Title: Tight Fine-Grained Bounds for Direct Access on Join Queries
Subjects: Databases (cs.DB); Computational Complexity (cs.CC)
[391]  arXiv:2202.07053 (replaced) [pdf, ps, other]
Title: Rank-one Boolean tensor factorization and the multilinear polytope
Subjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM)
[392]  arXiv:2205.00400 (replaced) [pdf, other]
Title: Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Comments: ICME2023
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[393]  arXiv:2205.04522 (replaced) [pdf, other]
Title: Assessing Confidence with Assurance 2.0
Comments: Second Edition
Subjects: Artificial Intelligence (cs.AI)
[394]  arXiv:2207.14682 (replaced) [pdf, other]
Title: Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks
Comments: Published at MMFORWILD 2022, ICPR Workshops - Code: this https URL . International Conference on Pattern Recognition. Cham: Springer Nature Switzerland, 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[395]  arXiv:2207.14686 (replaced) [pdf, other]
Title: Forensic License Plate Recognition with Compression-Informed Transformers
Comments: Published at ICIP 2022, Code: this https URL
Journal-ref: In IEEE International Conference on Image Processing (ICIP), pp. 406-410. IEEE, 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[396]  arXiv:2208.04188 (replaced) [pdf, ps, other]
Title: A quadratic estimation for the Kühnel conjecture on embeddings
Comments: 23 pages, no figures, minor revision, Remark 4.1.b corrected
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Algebraic Topology (math.AT); Geometric Topology (math.GT)
[397]  arXiv:2210.00314 (replaced) [pdf, other]
Title: Learning Hierarchical Image Segmentation For Recognition and By Recognition
Comments: ICLR 2024 (spotlight). First two authors contributed equally. Code available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[398]  arXiv:2210.03990 (replaced) [pdf, other]
Title: Weisfeiler-Lehman goes Dynamic: An Analysis of the Expressive Power of Graph Neural Networks for Attributed and Dynamic Graphs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[399]  arXiv:2210.13386 (replaced) [pdf, other]
Title: Contraction of Locally Differentially Private Mechanisms
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Statistics Theory (math.ST); Machine Learning (stat.ML)
[400]  arXiv:2301.03381 (replaced) [pdf, ps, other]
Title: Space-Time FEM for the Vectorial Wave Equation under Consideration of Ohm's Law
Comments: 41 pages
Subjects: Numerical Analysis (math.NA)
[401]  arXiv:2301.03865 (replaced) [pdf, other]
Title: Contact graphs of boxes with unidirectional contacts
Comments: Minor change
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[402]  arXiv:2301.06813 (replaced) [pdf, other]
Title: AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Comments: Accepted by IEEE Transactions on Parallel and Distributed Systems (TPDS) 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[403]  arXiv:2301.07281 (replaced) [pdf, other]
Title: Detecting and Ranking Causal Anomalies in End-to-End Complex System
Subjects: Machine Learning (cs.LG)
[404]  arXiv:2301.12809 (replaced) [pdf, other]
Title: The Hidden Power of Pure 16-bit Floating-Point Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)
[405]  arXiv:2302.00890 (replaced) [pdf, other]
Title: Neural Common Neighbor with Completion for Link Prediction
Comments: ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
[406]  arXiv:2302.01068 (replaced) [pdf, other]
Title: FedLAP-DP: Federated Learning by Sharing Differentially Private Loss Approximations
Comments: To appear in PoPETS'24
Subjects: Machine Learning (cs.LG)
[407]  arXiv:2302.03577 (replaced) [pdf, ps, other]
Title: Compressed sensing for inverse problems and the sample complexity of the sparse Radon transform
Comments: 57 pages
Subjects: Functional Analysis (math.FA); Information Theory (cs.IT); Optimization and Control (math.OC)
[408]  arXiv:2302.11885 (replaced) [pdf, other]
Title: The Joint Weighted Average (JWA) Operator
Subjects: Artificial Intelligence (cs.AI)
[409]  arXiv:2303.07290 (replaced) [pdf, ps, other]
Title: Finding Diverse Minimum s-t Cuts
Comments: An earlier version of this works appeared at the 34th International Symposium on Algorithms and Computation (ISAAC 2023). Corrected typos in Section 3 and revised arguments in Section 4. Results unchanged. Added new complexity results in Section 5
Subjects: Data Structures and Algorithms (cs.DS)
[410]  arXiv:2303.07563 (replaced) [pdf, other]
Title: Bounded-Confidence Models of Opinion Dynamics with Adaptive Confidence Bounds
Comments: revised version; 45 pages
Subjects: Social and Information Networks (cs.SI); Dynamical Systems (math.DS); Probability (math.PR); Adaptation and Self-Organizing Systems (nlin.AO); Physics and Society (physics.soc-ph)
[411]  arXiv:2303.10396 (replaced) [pdf, other]
Title: Towards Diverse Binary Segmentation via A Simple yet General Gated Network
Comments: Accepted by IJCV 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[412]  arXiv:2304.01397 (replaced) [pdf, other]
Title: LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models
Subjects: Software Engineering (cs.SE)
[413]  arXiv:2304.01596 (replaced) [pdf, ps, other]
Title: Mentions of Prejudice in News Media -- An International Comparison
Authors: David Rozado
Subjects: Computers and Society (cs.CY)
[414]  arXiv:2304.03688 (replaced) [pdf, other]
Title: Graph Parameters, Universal Obstructions, and WQO
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
[415]  arXiv:2304.14778 (replaced) [pdf, ps, other]
Title: Metric Temporal Equilibrium Logic over Timed Traces
Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)
Subjects: Artificial Intelligence (cs.AI)
[416]  arXiv:2305.03357 (replaced) [pdf, other]
Title: Persistent homology of directed spaces
Subjects: Algebraic Topology (math.AT); Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO)
[417]  arXiv:2305.03614 (replaced) [pdf, other]
Title: Denoising-Diffusion Alignment for Continuous Sign Language Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[418]  arXiv:2305.06058 (replaced) [pdf, other]
Title: Compressing neural network by tensor network with exponentially fewer variational parameters
Comments: 6 pages, 3 figures for the main text and 3 pages for the appendices
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[419]  arXiv:2305.06630 (replaced) [pdf, other]
Title: Predictive change point detection for heterogeneous data
Subjects: Machine Learning (cs.LG)
[420]  arXiv:2305.07052 (replaced) [pdf, other]
Title: A Framework for the Design and Realization of Alternative Superconducting Quantum Architectures
Comments: 6 pages, 5 figures
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
[421]  arXiv:2305.16602 (replaced) [pdf, other]
Title: Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
Comments: 25 Pages, 4 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[422]  arXiv:2306.01875 (replaced) [pdf, other]
Title: DiffECG: A Versatile Probabilistic Diffusion Model for ECG Signals Synthesis
Comments: Accepted in IEEE SERA 2024 conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[423]  arXiv:2306.06999 (replaced) [pdf, other]
Title: Temporal Reachability Dominating Sets: contagion in temporal graphs
Comments: 38 pages, 17 figures
Subjects: Discrete Mathematics (cs.DM); Computational Complexity (cs.CC); Combinatorics (math.CO)
[424]  arXiv:2306.07465 (replaced) [pdf, other]
Title: A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
Comments: 26 Pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[425]  arXiv:2306.08068 (replaced) [pdf, other]
Title: DORSal: Diffusion for Object-centric Representations of Scenes et al
Comments: Accepted to ICLR 2024. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[426]  arXiv:2306.13723 (replaced) [pdf, other]
[427]  arXiv:2307.02511 (replaced) [pdf, other]
Title: Automating Computational Design with Generative AI
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[428]  arXiv:2307.05167 (replaced) [pdf, other]
Title: A non-custodial wallet for digital currency: design challenges and opportunities
Comments: 29 pages, 12 figures
Subjects: Computers and Society (cs.CY)
[429]  arXiv:2307.05641 (replaced) [pdf, other]
Title: Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Comments: published at Interspeech 2023 - Code: this https URL
Journal-ref: Proc. INTERSPEECH 2023, 5057-5061 (2023)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[430]  arXiv:2307.06472 (replaced) [pdf, other]
Title: Early Autism Diagnosis based on Path Signature and Siamese Unsupervised Feature Compressor
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
[431]  arXiv:2307.07660 (replaced) [pdf, other]
Title: Zip-zip Trees: Making Zip Trees More Balanced, Biased, Compact, or Persistent
Authors: Ofek Gila (1), Michael T. Goodrich (1), Robert E. Tarjan (2) ((1) University of California, Irvine, (2) Princeton University)
Comments: v2 to appear in the journal Algorithmica, 24 pages, 9 figures
Subjects: Data Structures and Algorithms (cs.DS)
[432]  arXiv:2307.08643 (replaced) [pdf, other]
Title: Corruptions of Supervised Learning Problems: Typology and Mitigations
Comments: 56 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[433]  arXiv:2307.12371 (replaced) [pdf, other]
Title: PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization
Comments: LREC-COLING 2024, Torino (Italia), 20-25 May, 2024
Subjects: Computation and Language (cs.CL)
[434]  arXiv:2307.12512 (replaced) [pdf, other]
Title: XRLoc: Accurate UWB Localization to Realize XR Deployments
Comments: This paper is accepted by ACM SenSys 2023. The published version is this https URL in ACM Digital Library
Journal-ref: Proceedings of ACM Conference on Embedded Networked Sensor Systems (ACM SenSys'23), pp.459-473, 2023
Subjects: Human-Computer Interaction (cs.HC); Networking and Internet Architecture (cs.NI); Robotics (cs.RO); Signal Processing (eess.SP)
[435]  arXiv:2308.09110 (replaced) [pdf, other]
Title: JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer
Comments: 15 pages, 9 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[436]  arXiv:2308.09381 (replaced) [pdf, other]
Title: On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
Subjects: Machine Learning (cs.LG)
[437]  arXiv:2308.14716 (replaced) [pdf, ps, other]
Title: Local Lipschitz Filters for Bounded-Range Functions with Applications to Arbitrary Real-Valued Functions
Subjects: Data Structures and Algorithms (cs.DS)
[438]  arXiv:2309.04504 (replaced) [pdf, other]
Title: Compositional Learning of Visually-Grounded Concepts Using Reinforcement
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[439]  arXiv:2309.05030 (replaced) [pdf, other]
Title: Decolonial AI Alignment: Openness, Viśe\d{s}a-Dharma, and Including Excluded Knowledges
Authors: Kush R. Varshney
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[440]  arXiv:2309.05249 (replaced) [pdf, other]
Title: Evaluating Visual Odometry Methods for Autonomous Driving in Rain
Comments: A version of the paper presented at IEEE International Conference on Automation Science and Engineering (CASE) 2023. Fixed grammar and phrasing to improve clarity of the statements made. Emphasized on the need for a more robust sensor fusion based approach for localization in rain for autonomous driving
Subjects: Robotics (cs.RO)
[441]  arXiv:2309.10953 (replaced) [pdf, other]
Title: Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces
Comments: Revisions made in accordance with reviewer's wishes. The revision primarily includes a detailed statement of our contribution, a justification of our multiscale approach, further explanations of our RL problem setup, numerical experiments, and relevant references in response to reviewers' comments. 29 pages; 9 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[442]  arXiv:2309.12284 (replaced) [pdf, other]
Title: MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Comments: To appear at ICLR 2024 (Spotlight). Project Page: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[443]  arXiv:2309.14246 (replaced) [pdf, other]
Title: Learning Risk-Aware Quadrupedal Locomotion using Distributional Reinforcement Learning
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[444]  arXiv:2309.16263 (replaced) [pdf, ps, other]
Title: Cooperation Dynamics in Multi-Agent Systems: Exploring Game-Theoretic Scenarios with Mean-Field Equilibria
Comments: Accepted for MADGames: Multi-Agent Dynamic Games Workshop at IROS 2023, see details at this https URL
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)
[445]  arXiv:2309.17176 (replaced) [pdf, other]
Title: AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[446]  arXiv:2310.01329 (replaced) [pdf, other]
Title: BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models
Comments: ICLR 2024 camera-ready version
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[447]  arXiv:2310.04420 (replaced) [pdf, other]
Title: BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity
Comments: ICLR 2024. Project page: this https URL
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
[448]  arXiv:2310.05336 (replaced) [pdf, other]
Title: GReAT: A Graph Regularized Adversarial Training Method
Comments: 25 pages including references. 7 figures and 6 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[449]  arXiv:2310.05597 (replaced) [pdf, other]
Title: Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance
Subjects: Computation and Language (cs.CL)
[450]  arXiv:2310.05801 (replaced) [pdf, other]
Title: An operator preconditioning perspective on training in physics-informed machine learning
Subjects: Machine Learning (cs.LG)
[451]  arXiv:2310.06555 (replaced) [pdf, other]
Title: It's About Time: Temporal References in Emergent Communication
Comments: 26 pages main body and 36 pages supplementary material, 8 figures in main body. Code available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[452]  arXiv:2310.07626 (replaced) [pdf, other]
Title: Learning of Sea Surface Height Interpolation from Multi-variate Simulated Satellite Observations
Comments: submitted to JAMES. 32 pages, major revision
Subjects: Machine Learning (cs.LG)
[453]  arXiv:2310.09653 (replaced) [pdf, other]
Title: SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Comments: Accepted at ICML 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[454]  arXiv:2310.11052 (replaced) [pdf, other]
Title: Investigating Threats Posed by SMS Origin Spoofing to IoT Devices
Authors: Akaki Tsunoda
Subjects: Cryptography and Security (cs.CR)
[455]  arXiv:2310.11760 (replaced) [pdf, ps, other]
Title: Performance Investigation of an Optimal Control Strategy for Zero-Emission Operations of Shipboard Microgrids
Comments: Submitted to SPEEDAM 2024
Subjects: Systems and Control (eess.SY)
[456]  arXiv:2310.11884 (replaced) [pdf, other]
Title: From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks
Comments: Accepted in Neurosymbolic Artificial Intelligence
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[457]  arXiv:2310.16960 (replaced) [pdf, other]
Title: Privately Aligning Language Models with Reinforcement Learning
Comments: Accepted at ICLR 2024
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[458]  arXiv:2311.00653 (replaced) [pdf, other]
Title: Integrating measures of replicability into scholarly search: Challenges and opportunities
Subjects: Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC)
[459]  arXiv:2311.01957 (replaced) [pdf, ps, other]
Title: Distributed online constrained convex optimization with event-triggered communication
Comments: 12 pages, 3 figures
Subjects: Optimization and Control (math.OC); Multiagent Systems (cs.MA)
[460]  arXiv:2311.03703 (replaced) [pdf, other]
Title: Practical Performance Guarantees for Pipelined DNN Inference
Comments: 17 pages, 5 figures
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[461]  arXiv:2311.04037 (replaced) [pdf, other]
Title: Causal Discovery Under Local Privacy
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
[462]  arXiv:2311.04157 (replaced) [pdf, other]
Title: A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
Comments: Accepted to International Conference on Learning Representations 2024 (ICLR 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[463]  arXiv:2311.07794 (replaced) [pdf, ps, other]
Title: How to Use Quantum Indistinguishability Obfuscation
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
[464]  arXiv:2311.09047 (replaced) [pdf, other]
Title: 6G Non-Terrestrial Networks Enabled Low-Altitude Economy: Opportunities and Challenges
Comments: This paper has been submitted to IEEE for possible publication
Subjects: Information Theory (cs.IT)
[465]  arXiv:2311.09441 (replaced) [pdf, other]
Title: Exploring the Privacy-Energy Consumption Tradeoff for Split Federated Learning
Comments: 7 pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[466]  arXiv:2311.10065 (replaced) [pdf, other]
Title: Visual Environment Assessment for Safe Autonomous Quadrotor Landing
Comments: 7 pages, 5 figures, 1 table, 2024 International Conference on Unmanned Aircraft Systems (ICUAS)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[467]  arXiv:2311.10329 (replaced) [pdf, other]
Title: High-fidelity Person-centric Subject-to-Image Synthesis
Comments: Accepted by CVPR2024. Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[468]  arXiv:2311.11871 (replaced) [pdf, other]
Title: Training robust and generalizable quantum models
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Optimization and Control (math.OC)
[469]  arXiv:2311.12410 (replaced) [pdf, other]
Title: nach0: Multimodal Natural and Chemical Languages Foundation Model
Comments: Accepted to Chemical Science Journal. Models are publicly available via this https URL and this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[470]  arXiv:2311.16834 (replaced) [pdf, other]
Title: FocusLearn: Fully-Interpretable, High-Performance Modular Neural Networks for Time Series
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[471]  arXiv:2311.17983 (replaced) [pdf, other]
Title: Improving Interpretation Faithfulness for Vision Transformers
Comments: Accepted by ICML 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[472]  arXiv:2311.18763 (replaced) [pdf, other]
Title: Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters
Comments: CVPR-W 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[473]  arXiv:2312.01441 (replaced) [pdf, other]
Title: Koopman-based feedback design with stability guarantees
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[474]  arXiv:2312.03682 (replaced) [pdf, other]
Title: What Planning Problems Can A Relational Neural Network Solve?
Comments: NeurIPS 2023 (Spotlight). Project page: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[475]  arXiv:2312.03853 (replaced) [pdf, other]
Title: Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[476]  arXiv:2312.05176 (replaced) [pdf, other]
Title: MRI Scan Synthesis Methods based on Clustering and Pix2Pix
Comments: Accepted at AIME 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[477]  arXiv:2312.06425 (replaced) [pdf, other]
Title: Numeric Truncation Security Predicate
Journal-ref: 2023 Ivannikov ISPRAS Open Conference (ISPRAS), IEEE, 2023, pp. 84-92
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
[478]  arXiv:2312.06733 (replaced) [pdf, other]
Title: TULIP: Transformer for Upsampling of LiDAR Point Clouds
Comments: The paper was accepted by CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[479]  arXiv:2312.06947 (replaced) [pdf, other]
Title: MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
Comments: 13 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[480]  arXiv:2312.08189 (replaced) [pdf, other]
Title: GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements
Journal-ref: Proceedings of the 16th Annual ACM India Compute Conference (2023) 55-60
Subjects: Software Engineering (cs.SE)
[481]  arXiv:2312.09400 (replaced) [pdf, ps, other]
Title: Refuting approaches to the log-rank conjecture for XOR functions
Comments: Added additional background and intuition
Subjects: Computational Complexity (cs.CC)
[482]  arXiv:2312.12267 (replaced) [pdf, other]
Title: Optimal Power Flow Pursuit via Feedback-based Safe Gradient Flow
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
[483]  arXiv:2312.13859 (replaced) [pdf, other]
Title: Nonlinear Functional Estimation: Functional Detectability and Full Information Estimation
Comments: 15 pages, 3 figures
Subjects: Systems and Control (eess.SY)
[484]  arXiv:2312.17279 (replaced) [pdf, other]
Title: Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Comments: Shorter version accepted to ICASSP 2024
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[485]  arXiv:2401.01508 (replaced) [pdf, other]
Title: Practical Guidelines for the Selection and Evaluation of Natural Language Processing Techniques in Requirements Engineering
Comments: This article will appear as Chapter 15 in a book titled "Handbook of Natural Language Processing for Requirements Engineering", to be published by Springer
Subjects: Software Engineering (cs.SE)
[486]  arXiv:2401.05235 (replaced) [pdf, other]
Title: A Survey on Optimization Studies of Group Centrality Metrics
Subjects: Social and Information Networks (cs.SI); Optimization and Control (math.OC)
[487]  arXiv:2401.08501 (replaced) [pdf, other]
Title: ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation
Comments: ICLR 2024 (oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[488]  arXiv:2401.08788 (replaced) [pdf, other]
Title: The Impact of Differential Feature Under-reporting on Algorithmic Fairness
Comments: ACM Conference on Fairness, Accountability, and Transparency (FAccT 2024)
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
[489]  arXiv:2401.10314 (replaced) [pdf, other]
Title: LangProp: A code optimization framework using Large Language Models applied to driving
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[490]  arXiv:2401.10647 (replaced) [pdf, other]
Title: Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models
Comments: Under review. {this https URL}
Subjects: Computation and Language (cs.CL)
[491]  arXiv:2401.10990 (replaced) [pdf, other]
Title: A Nonlinear Observer Design for the Discrete-time Systems: Exploiting Matrix-Multiplier-based LMI Approach
Authors: Shivaraj Mohite
Subjects: Systems and Control (eess.SY)
[492]  arXiv:2401.14482 (replaced) [pdf, other]
Title: The geodesic dispersion phenomenon in random fields dynamics
Comments: arXiv admin note: text overlap with arXiv:2111.03905
Subjects: Information Theory (cs.IT); Mathematical Physics (math-ph); Computational Physics (physics.comp-ph)
[493]  arXiv:2402.00957 (replaced) [pdf, other]
Title: Credal Learning Theory
Comments: 19 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[494]  arXiv:2402.01864 (replaced) [pdf, other]
Title: (A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice
Comments: 14 pages
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[495]  arXiv:2402.03015 (replaced) [pdf, other]
Title: Open-separating dominating codes in graphs
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
[496]  arXiv:2402.03328 (replaced) [pdf, other]
Title: Visual Enumeration is Challenging for Large-scale Generative AI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[497]  arXiv:2402.04074 (replaced) [pdf, other]
Title: Mean-Square Stability and Stabilizability for LTI and Stochastic Systems Connected in Feedback
Subjects: Systems and Control (eess.SY)
[498]  arXiv:2402.08021 (replaced) [pdf, other]
Title: Careless Whisper: Speech-to-Text Hallucination Harms
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
[499]  arXiv:2402.08289 (replaced) [pdf, ps, other]
Title: Why Studying Cut-ins? Comparing Cut-ins and Other Lane Changes Based on Naturalistic Driving Data
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[500]  arXiv:2402.09330 (replaced) [pdf, other]
Title: 3D-based RNA function prediction tools in rnaglib
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
[501]  arXiv:2402.11319 (replaced) [pdf, other]
Title: Hysteresis Compensation of Flexible Continuum Manipulator using RGBD Sensing and Temporal Convolutional Network
Comments: 8 pages, 11 figures, 5 tables
Journal-ref: IEEE Robotics and Automation Letters (2024)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[502]  arXiv:2402.11359 (replaced) [pdf, other]
Title: Offline Training of Language Model Agents with Functions as Learnable Weights
Comments: 22 pages, 10 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[503]  arXiv:2402.11553 (replaced) [pdf, other]
Title: On the Limits of Information Spread by Memory-less Agents
Authors: Niccolò D'Archivio (1), Robin Vacus (2) ((1) INRIA, (2) Bocconi University)
Comments: 26 pages, 4 figures
Subjects: Multiagent Systems (cs.MA); Distributed, Parallel, and Cluster Computing (cs.DC)
[504]  arXiv:2402.12285 (replaced) [pdf, other]
Title: Capturing the Shape of a Point Set with a Line Segment
Subjects: Computational Geometry (cs.CG)
[505]  arXiv:2402.14095 (replaced) [pdf, other]
Title: Zero-shot generalization across architectures for visual classification
Comments: Accepted as a Tiny Paper at ICLR 2024. Code available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[506]  arXiv:2402.15942 (replaced) [pdf, other]
Title: Minimum energy density steering of linear systems with Gromov-Wasserstein terminal cost
Comments: 7 pages
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[507]  arXiv:2402.18554 (replaced) [pdf, other]
Title: Extended Kalman filter -- Koopman operator for tractable stochastic optimal control
Comments: 6 pages
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[508]  arXiv:2402.19379 (replaced) [pdf, other]
Title: Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy
Comments: 20 pages; 13 visualizations (nine figures, four tables)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[509]  arXiv:2403.00462 (replaced) [pdf, other]
Title: LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues
Comments: Accepted at NAACL SRW 2024
Subjects: Computation and Language (cs.CL)
[510]  arXiv:2403.02545 (replaced) [pdf, other]
Title: Wukong: Towards a Scaling Law for Large-Scale Recommendation
Comments: 12 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[511]  arXiv:2403.03134 (replaced) [pdf, other]
Title: Simplicity in Complexity : Explaining Visual Complexity using Deep Segmentation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
[512]  arXiv:2403.05103 (replaced) [pdf, ps, other]
Title: Safe Pareto Improvements for Expected Utility Maximizers in Program Games
Comments: 19 pages, 4 figures
Subjects: Computer Science and Game Theory (cs.GT)
[513]  arXiv:2403.07507 (replaced) [pdf, other]
Title: Reconstructions of Jupiter's magnetic field using physics informed neural networks
Subjects: Earth and Planetary Astrophysics (astro-ph.EP); Machine Learning (cs.LG)
[514]  arXiv:2403.09891 (replaced) [pdf, other]
Title: Fisher Mask Nodes for Language Model Merging
Comments: Accepted at LREC-COLING 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[515]  arXiv:2403.10659 (replaced) [pdf, other]
Title: Towards Practical Fabrication Stage Attacks Using Interrupt-Resilient Hardware Trojans
Subjects: Cryptography and Security (cs.CR)
[516]  arXiv:2403.11894 (replaced) [pdf, other]
Title: From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?
Comments: This paper has been accepted by Computational and Structural Biotechnology Journal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[517]  arXiv:2403.12619 (replaced) [pdf, other]
Title: Detection of Malicious Agents in Social Learning
Subjects: Social and Information Networks (cs.SI); Multiagent Systems (cs.MA); Signal Processing (eess.SP)
[518]  arXiv:2403.16369 (replaced) [pdf, other]
Title: Learning Action-based Representations Using Invariance
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[519]  arXiv:2403.16584 (replaced) [pdf, other]
Title: Can Large Language Models (or Humans) Disentangle Text?
Comments: To appear as: Nicolas Audinet de Pieuchon, Adel Daoud, Connor T. Jerzak, Moa Johansson, Richard Johansson. Can Large Language Models (or Humans) Disentangle Text? In: Sixth Workshop on NLP and Computational Social Science at NAACL, 2024
Subjects: Computation and Language (cs.CL)
[520]  arXiv:2403.17701 (replaced) [src]
Title: Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation
Comments: Experimental method encountered errors, undergoing experiment again
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[521]  arXiv:2403.17731 (replaced) [pdf, other]
Title: Coupled Boundary and Volume Integral Equations for Electromagnetic Scattering
Subjects: Numerical Analysis (math.NA)
[522]  arXiv:2404.00462 (replaced) [pdf, other]
Title: Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models
Comments: Presented at the Back to the Future-Robot Learning Going Probabilistic Workshop, co-located with ICRA 2024. this https URL
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
[523]  arXiv:2404.02894 (replaced) [pdf, other]
Title: Automated Transparency: A Legal and Empirical Analysis of the Digital Services Act Transparency Database
Comments: accepted to FAccT 2024; camera-ready version; 19 pages
Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
[524]  arXiv:2404.03263 (replaced) [pdf, other]
Title: On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models
Comments: ICLR 2024. 5th Workshop on Practical ML for Low Resource Settings (PML4LRS). Code can be found at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[525]  arXiv:2404.04562 (replaced) [pdf, other]
Title: Diffusion Time-step Curriculum for One Image to 3D Generation
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[526]  arXiv:2404.05688 (replaced) [pdf, other]
Title: David and Goliath: An Empirical Evaluation of Attacks and Defenses for QNNs at the Deep Edge
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[527]  arXiv:2404.05840 (replaced) [pdf, ps, other]
Title: Attention-Driven Multi-Agent Reinforcement Learning: Enhancing Decisions with Expertise-Informed Tasks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[528]  arXiv:2404.06202 (replaced) [pdf, other]
Title: Automated National Urban Map Extraction
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[529]  arXiv:2404.06423 (replaced) [pdf, other]
Title: Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints
Comments: 6 pages
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[530]  arXiv:2404.06484 (replaced) [pdf, other]
Title: Public-private funding models in open source software development: A case study on scikit-learn
Authors: Cailean Osborne
Comments: 15 pages
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
[531]  arXiv:2404.06508 (replaced) [pdf, other]
Title: On the Effect of (Near) Duplicate Subwords in Language Modelling
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[532]  arXiv:2404.06919 (replaced) [pdf, other]
Title: Longitudinal Analysis and Quantitative Assessment of Child Development through Mobile Interaction
Comments: 13 pages, 5 figures, 7 tables, 46 references
Subjects: Human-Computer Interaction (cs.HC)
[533]  arXiv:2404.09411 (replaced) [pdf, other]
Title: Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers
Comments: To appear at the Forty-first International Conference on Machine Learning (ICML2024)
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Genomics (q-bio.GN)
[534]  arXiv:2404.10073 (replaced) [pdf, other]
Title: Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stress Identification
Comments: 21 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[535]  arXiv:2404.12639 (replaced) [pdf, other]
Title: Single-Task Continual Offline Reinforcement Learning
Comments: 8 pages, 10 figures
Subjects: Machine Learning (cs.LG)
[536]  arXiv:2404.13236 (replaced) [pdf, other]
Title: LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models
Comments: Paper accepted at IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) IEEE, Osaka, Japan (2024)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)
[537]  arXiv:2404.13414 (replaced) [pdf, other]
Title: Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study
Comments: Accepted to Learning @ Scale 2024
Subjects: Human-Computer Interaction (cs.HC)
[538]  arXiv:2404.13630 (replaced) [pdf, ps, other]
Title: Utilizing Deep Learning to Optimize Software Development Processes
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[539]  arXiv:2404.15523 (replaced) [pdf, other]
Title: Understanding Hyperbolic Metric Learning through Hard Negative Sampling
Comments: published in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[540]  arXiv:2404.15782 (replaced) [pdf, ps, other]
Title: Risk or Chance? Large Language Models and Reproducibility in Human-Computer Interaction Research
Subjects: Human-Computer Interaction (cs.HC)
[541]  arXiv:2404.16051 (replaced) [pdf, other]
Title: TimeFlows: Visualizing Process Chronologies from Vast Collections of Heterogeneous Information Objects
Comments: 16 pages, accepted at RCIS 2024
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
[542]  arXiv:2404.16549 (replaced) [pdf, other]
Title: Application of Long-Short Term Memory and Convolutional Neural Networks for Real-Time Bridge Scour Prediction
Subjects: Machine Learning (cs.LG)
[543]  arXiv:2404.17129 (replaced) [pdf, other]
Title: Process Mining Embeddings: Learning Vector Representations for Petri Nets
Subjects: Artificial Intelligence (cs.AI)
[544]  arXiv:2404.17644 (replaced) [pdf, other]
Title: A Conditional Independence Test in the Presence of Discretization
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[545]  arXiv:2404.17699 (replaced) [pdf, other]
Title: Deep Learning for Melt Pool Depth Contour Prediction From Surface Thermal Images via Vision Transformers
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[546]  arXiv:2404.17818 (replaced) [pdf, ps, other]
Title: Automatic Build Repair for Test Cases using Incompatible Java Versions
Comments: 44 pages, 22 figures (incl. tables and listings); To be published in Information and Software Technology. Link to artifact is available within the paper
Subjects: Software Engineering (cs.SE)
[547]  arXiv:2404.17830 (replaced) [pdf, other]
Title: Dynamic Against Dynamic: An Open-set Self-learning Framework
Comments: The first two authors contributed equally to this work. Accepted at IJCAI2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[548]  arXiv:2404.17858 (replaced) [pdf, other]
Title: Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion
Comments: 10 pages, 6 figures
Subjects: Computation and Language (cs.CL)
[549]  arXiv:2404.17862 (replaced) [pdf, other]
Title: Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum
Comments: 10 pages, 4 figures
Subjects: Computation and Language (cs.CL)
[550]  arXiv:2404.18381 (replaced) [pdf, other]
Title: Object Registration in Neural Fields
Comments: Accepted to ICRA 2024 RoboNeRF workshop. 5 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:2402.09722
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[551]  arXiv:2404.19351 (replaced) [pdf, other]
Title: Deep Learning Forecasts Caldera Collapse Events at Kilauea Volcano
Subjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG)
[552]  arXiv:2404.19725 (replaced) [pdf, other]
Title: Fairness Without Demographics in Human-Centered Federated Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
[553]  arXiv:2405.00015 (replaced) [pdf, other]
Title: Experiences Porting Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-study
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[554]  arXiv:2405.00080 (replaced) [pdf, other]
Title: Recommenadation aided Caching using Combinatorial Multi-armed Bandits
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Networking and Internet Architecture (cs.NI)
[555]  arXiv:2405.00332 (replaced) [pdf, other]
Title: A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[556]  arXiv:2405.00338 (replaced) [pdf, other]
Title: Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model
Comments: 10 pages, 2 figures
Subjects: Information Retrieval (cs.IR)
[557]  arXiv:2405.00465 (replaced) [pdf, other]
Title: BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine
Subjects: Computation and Language (cs.CL)
[558]  arXiv:2405.00610 (replaced) [pdf, ps, other]
Title: Growth in products of matrices: fastest, average, and generic
Comments: 10 pages. Comments are welcome
Subjects: Group Theory (math.GR); Cryptography and Security (cs.CR); Combinatorics (math.CO); Dynamical Systems (math.DS); Probability (math.PR)
[559]  arXiv:2405.00711 (replaced) [pdf, other]
Title: Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[560]  arXiv:2405.00929 (replaced) [pdf, other]
Title: Quantum wave packet transforms with compact frequency support
Subjects: Quantum Physics (quant-ph); Numerical Analysis (math.NA)
[561]  arXiv:2405.01014 (replaced) [pdf, ps, other]
Title: Proven Runtime Guarantees for How the MOEA/D Computes the Pareto Front From the Subproblem Solutions
Subjects: Neural and Evolutionary Computing (cs.NE)
[562]  arXiv:2405.01022 (replaced) [pdf, other]
Title: UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[563]  arXiv:2405.01031 (replaced) [pdf, other]
Title: The Privacy Power of Correlated Noise in Decentralized Learning
Comments: Accepted as conference paper at ICML 2024
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
[564]  arXiv:2405.01080 (replaced) [pdf, other]
Title: KDPrint: Passive Authentication using Keystroke Dynamics-to-Image Encoding via Standardization
Comments: 12 pages, 7 figures
Subjects: Cryptography and Security (cs.CR)
[565]  arXiv:2405.01196 (replaced) [pdf, other]
Title: Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[566]  arXiv:2405.01327 (replaced) [pdf, other]
Title: Constrained Reinforcement Learning Under Model Mismatch
Comments: ICML 2024
Subjects: Machine Learning (cs.LG)
[567]  arXiv:2405.01329 (replaced) [pdf, other]
Title: Decentralization of Ethereum's Builder Market
Subjects: Cryptography and Security (cs.CR)
[568]  arXiv:2405.01443 (replaced) [pdf, ps, other]
Title: On the existence of approximate problems that preserve the type of a bifurcation point of a nonlinear problem. Application to the stationary Navier-Stokes equations
Subjects: Numerical Analysis (math.NA); Functional Analysis (math.FA)
[569]  arXiv:2405.01461 (replaced) [pdf, other]
Title: SATO: Stable Text-to-Motion Framework
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[570]  arXiv:2405.01477 (replaced) [pdf, other]
Title: "Sometimes You Just Gotta Risk It for the Biscuit": A Portrait of Student Risk-Taking
Comments: 7 pages, 1 figure, 4 tables
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)
[571]  arXiv:2405.01507 (replaced) [pdf, other]
Title: Accelerating Convergence in Bayesian Few-Shot Classification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[572]  arXiv:2405.01524 (replaced) [pdf, other]
Title: A separability-based approach to quantifying generalization: which layer is best?
Comments: 6, pages, 6 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[ total of 572 entries: 1-572 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)