Computer Science
- [1] arXiv:2405.06643 [pdf, ps, other]
-
Title: Levels of AI Agents: from Rules to Large Language ModelsSubjects: Computation and Language (cs.CL)
AI agents are defined as artificial entities to perceive the environment, make decisions and take actions. Inspired by the 6 levels of autonomous driving by Society of Automotive Engineers, the AI agents are also categorized based on utilities and strongness, as the following levels: L0, no AI, with tools taking into account perception plus actions; L1, using rule-based AI; L2, making rule-based AI replaced by IL/RL-based AI, with additional reasoning & decision making; L3, applying LLM-based AI instead of IL/RL-based AI, additionally setting up memory & reflection; L4, based on L3, facilitating autonomous learning & generalization; L5, based on L4, appending personality of emotion and character and collaborative behavior with multi-agents.
- [2] arXiv:2405.06646 [pdf, ps, other]
-
Title: On-the-fly Learning to Transfer Motion Style with Diffusion Models: A Semantic Guidance ApproachComments: 23 pagesSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
In recent years, the emergence of generative models has spurred development of human motion generation, among which the generation of stylized human motion has consistently been a focal point of research. The conventional approach for stylized human motion generation involves transferring the style from given style examples to new motions. Despite decades of research in human motion style transfer, it still faces three main challenges: 1) difficulties in decoupling the motion content and style; 2) generalization to unseen motion style. 3) requirements of dedicated motion style dataset; To address these issues, we propose an on-the-fly human motion style transfer learning method based on the diffusion model, which can learn a style transfer model in a few minutes of fine-tuning to transfer an unseen style to diverse content motions. The key idea of our method is to consider the denoising process of the diffusion model as a motion translation process that learns the difference between the style-neutral motion pair, thereby avoiding the challenge of style and content decoupling. Specifically, given an unseen style example, we first generate the corresponding neutral motion through the proposed Style-Neutral Motion Pair Generation module. We then add noise to the generated neutral motion and denoise it to be close to the style example to fine-tune the style transfer diffusion model. We only need one style example and a text-to-motion dataset with predominantly neutral motion (e.g. HumanML3D). The qualitative and quantitative evaluations demonstrate that our method can achieve state-of-the-art performance and has practical applications.
- [3] arXiv:2405.06650 [pdf, ps, other]
-
Title: Large Language Models as Planning Domain GeneratorsComments: Published at ICAPS 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at this https URL.
- [4] arXiv:2405.06652 [pdf, ps, other]
-
Title: Large Language Model (LLM) AI text generation detection based on transformer deep learning algorithmComments: 6 pagesSubjects: Computation and Language (cs.CL)
In this paper, a tool for detecting LLM AI text generation is developed based on the Transformer model, aiming to improve the accuracy of AI text generation detection and provide reference for subsequent research. Firstly the text is Unicode normalised, converted to lowercase form, characters other than non-alphabetic characters and punctuation marks are removed by regular expressions, spaces are added around punctuation marks, first and last spaces are removed, consecutive ellipses are replaced with single spaces and the text is connected using the specified delimiter. Next remove non-alphabetic characters and extra whitespace characters, replace multiple consecutive whitespace characters with a single space and again convert to lowercase form. The deep learning model combines layers such as LSTM, Transformer and CNN for text classification or sequence labelling tasks. The training and validation sets show that the model loss decreases from 0.127 to 0.005 and accuracy increases from 94.96 to 99.8, indicating that the model has good detection and classification ability for AI generated text. The test set confusion matrix and accuracy show that the model has 99% prediction accuracy for AI-generated text, with a precision of 0.99, a recall of 1, and an f1 score of 0.99, achieving a very high classification accuracy. Looking forward, it has the prospect of wide application in the field of AI text detection.
- [5] arXiv:2405.06656 [pdf, ps, other]
-
Title: Exploring Social Media Posts for Depression Identification: A Study on Reddit DatasetComments: Accepted as a poster in IndiaHCI 2023Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Depression is one of the most common mental disorders affecting an individual's personal and professional life. In this work, we investigated the possibility of utilizing social media posts to identify depression in individuals. To achieve this goal, we conducted a preliminary study where we extracted and analyzed the top Reddit posts made in 2022 from depression-related forums. The collected data were labeled as depressive and non-depressive using UMLS Metathesaurus. Further, the pre-processed data were fed to classical machine learning models, where we achieved an accuracy of 92.28\% in predicting the depressive and non-depressive posts.
- [6] arXiv:2405.06664 [pdf, ps, other]
-
Title: A categorical account of composition methods in logic (extended version)Comments: This is an extended version of arXiv:2304.10196 which, apart from providing full proofs of all statements, takes a more categorical point of view to tell the whole story. In particular, we highlight and explain the underlying categorical constructions in detailSubjects: Logic in Computer Science (cs.LO); Category Theory (math.CT)
We present a categorical theory of the composition methods in finite model theory -- a key technique enabling modular reasoning about complex structures by building them out of simpler components. The crucial results required by the composition methods are Feferman--Vaught--Mostowski (FVM) type theorems, which characterize how logical equivalence behaves under composition and transformation of models.
Our results are developed by extending the recently introduced game comonad semantics for model comparison games. This level of abstraction allow us to give conditions yielding FVM type results in a uniform way. Our theorems are parametric in the classes of models, logics and operations involved. Furthermore, they naturally account for the existential and positive existential fragments, and extensions with counting quantifiers of these logics. We also reveal surprising connections between FVM type theorems, and classical concepts in the theory of monads.
We illustrate our methods by recovering many classical theorems of practical interest, including a refinement of a previous result by Dawar, Severini, and Zapata concerning the 3-variable counting logic and cospectrality. To highlight the importance of our techniques being parametric in the logic of interest, we prove a family of FVM theorems for products of structures, uniformly in the logic in question, which cannot be done using specific game arguments.
This is an extended version of the LiCS 2023 conference paper of the same name. - [7] arXiv:2405.06665 [pdf, ps, html, other]
-
Title: Enhancing Language Models for Financial Relation Extraction with Named Entities and Part-of-SpeechComments: Accepted to ICLR 2024 Tiny Paper TrackSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
The Financial Relation Extraction (FinRE) task involves identifying the entities and their relation, given a piece of financial statement/text. To solve this FinRE problem, we propose a simple but effective strategy that improves the performance of pre-trained language models by augmenting them with Named Entity Recognition (NER) and Part-Of-Speech (POS), as well as different approaches to combine these information. Experiments on a financial relations dataset show promising results and highlights the benefits of incorporating NER and POS in existing models. Our dataset and codes are available at this https URL.
- [8] arXiv:2405.06667 [pdf, ps, html, other]
-
Title: Sentiment Polarity Analysis of Bangla Food Reviews Using Machine and Deep Learning AlgorithmsSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
The Internet has become an essential tool for people in the modern world. Humans, like all living organisms, have essential requirements for survival. These include access to atmospheric oxygen, potable water, protective shelter, and sustenance. The constant flux of the world is making our existence less complicated. A significant portion of the population utilizes online food ordering services to have meals delivered to their residences. Although there are numerous methods for ordering food, customers sometimes experience disappointment with the food they receive. Our endeavor was to establish a model that could determine if food is of good or poor quality. We compiled an extensive dataset of over 1484 online reviews from prominent food ordering platforms, including Food Panda and HungryNaki. Leveraging the collected data, a rigorous assessment of various deep learning and machine learning techniques was performed to determine the most accurate approach for predicting food quality. Out of all the algorithms evaluated, logistic regression emerged as the most accurate, achieving an impressive 90.91% accuracy. The review offers valuable insights that will guide the user in deciding whether or not to order the food.
- [9] arXiv:2405.06668 [pdf, ps, html, other]
-
Title: Exposing and Explaining Fake News On-the-FlyFrancisco de Arriba-Pérez, Silvia García-Méndez, Fátima Leal, Benedita Malheiro, Juan Carlos BurguilloJournal-ref: Mach Learn (2024)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Social media platforms enable the rapid dissemination and consumption of information. However, users instantly consume such content regardless of the reliability of the shared data. Consequently, the latter crowdsourcing model is exposed to manipulation. This work contributes with an explainable and online classification method to recognize fake news in real-time. The proposed method combines both unsupervised and supervised Machine Learning approaches with online created lexica. The profiling is built using creator-, content- and context-based features using Natural Language Processing techniques. The explainable classification mechanism displays in a dashboard the features selected for classification and the prediction confidence. The performance of the proposed solution has been validated with real data sets from Twitter and the results attain 80 % accuracy and macro F-measure. This proposal is the first to jointly provide data stream processing, profiling, classification and explainability. Ultimately, the proposed early detection, isolation and explanation of fake news contribute to increase the quality and trustworthiness of social media contents.
- [10] arXiv:2405.06669 [pdf, ps, html, other]
-
Title: Instruction-Guided Bullet Point Summarization of Long Financial Earnings Call TranscriptsComments: Accepted in SIGIR 2024Subjects: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); Information Retrieval (cs.IR); Machine Learning (cs.LG)
While automatic summarization techniques have made significant advancements, their primary focus has been on summarizing short news articles or documents that have clear structural patterns like scientific articles or government reports. There has not been much exploration into developing efficient methods for summarizing financial documents, which often contain complex facts and figures. Here, we study the problem of bullet point summarization of long Earning Call Transcripts (ECTs) using the recently released ECTSum dataset. We leverage an unsupervised question-based extractive module followed by a parameter efficient instruction-tuned abstractive module to solve this task. Our proposed model FLAN-FinBPS achieves new state-of-the-art performances outperforming the strongest baseline with 14.88% average ROUGE score gain, and is capable of generating factually consistent bullet point summaries that capture the important facts discussed in the ECTs.
- [11] arXiv:2405.06670 [pdf, ps, html, other]
-
Title: TLINet: Differentiable Neural Network Temporal Logic InferenceSubjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG)
There has been a growing interest in extracting formal descriptions of the system behaviors from data. Signal Temporal Logic (STL) is an expressive formal language used to describe spatial-temporal properties with interpretability. This paper introduces TLINet, a neural-symbolic framework for learning STL formulas. The computation in TLINet is differentiable, enabling the usage of off-the-shelf gradient-based tools during the learning process. In contrast to existing approaches, we introduce approximation methods for max operator designed specifically for temporal logic-based gradient techniques, ensuring the correctness of STL satisfaction evaluation. Our framework not only learns the structure but also the parameters of STL formulas, allowing flexible combinations of operators and various logical structures. We validate TLINet against state-of-the-art baselines, demonstrating that our approach outperforms these baselines in terms of interpretability, compactness, rich expressibility, and computational efficiency.
- [12] arXiv:2405.06671 [pdf, ps, other]
-
Title: Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral LabellingSubhendu Khatuya, Rajdeep Mukherjee, Akash Ghosh, Manjunath Hegde, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan GoyalSubjects: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags. Different from prior works, we investigate the feasibility of solving this extreme classification problem using a generative paradigm through instruction tuning of Large Language Models (LLMs). To this end, we leverage metric metadata information to frame our target outputs while proposing a parameter efficient solution for the task using LoRA. We perform experiments on two recently released financial numeric labeling datasets. Our proposed model, FLAN-FinXC, achieves new state-of-the-art performances on both the datasets, outperforming several strong baselines. We explain the better scores of our proposed model by demonstrating its capability for zero-shot as well as the least frequently occurring tags. Also, even when we fail to predict the XBRL tags correctly, our generated output has substantial overlap with the ground-truth in majority of the cases.
- [13] arXiv:2405.06673 [pdf, ps, other]
-
Title: Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health RecordsComments: The 6th Clinical Natural Language Processing Workshop at NAACL 2024; Minor Change from Camera-ReadySubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of patients' medical care, from hospital admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries requires skills in query languages like SQL. To make information retrieval more accessible, one strategy is to build a question-answering system, possibly leveraging text-to-SQL models that can automatically translate natural language questions into corresponding SQL queries and use these queries to retrieve the answers. The EHRSQL 2024 shared task aims to advance and promote research in developing a question-answering system for EHRs using text-to-SQL modeling, capable of reliably providing requested answers to various healthcare professionals to improve their clinical work processes and satisfy their needs. Among more than 100 participants who applied to the shared task, eight teams completed the entire shared task processes and demonstrated a wide range of methods to effectively solve this task. In this paper, we describe the task of reliable text-to-SQL modeling, the dataset, and the methods and results of the participants. We hope this shared task will spur further research and insights into developing reliable question-answering systems for EHRs.
- [14] arXiv:2405.06674 [pdf, ps, html, other]
-
Title: Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.
- [15] arXiv:2405.06676 [pdf, ps, other]
-
Title: EDA Corpus: A Large Language Model Dataset for Enhanced Interaction with OpenROADBing-Yue Wu, Utsav Sharma, Sai Rahul Dhanvi Kankipati, Ajay Yadav, Bintu Kappil George, Sai Ritish Guntupalli, Austin Rovinski, Vidya A. ChhabriaComments: Under review at Workshop on LLM-Aided Design (LAD'24)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Large language models (LLMs) serve as powerful tools for design, providing capabilities for both task automation and design assistance. Recent advancements have shown tremendous potential for facilitating LLM integration into the chip design process; however, many of these works rely on data that are not publicly available and/or not permissively licensed for use in LLM training and distribution. In this paper, we present a solution aimed at bridging this gap by introducing an open-source dataset tailored for OpenROAD, a widely adopted open-source EDA toolchain. The dataset features over 1000 data points and is structured in two formats: (i) a pairwise set comprised of question prompts with prose answers, and (ii) a pairwise set comprised of code prompts and their corresponding OpenROAD scripts. By providing this dataset, we aim to facilitate LLM-focused research within the EDA domain. The dataset is available at this https URL.
- [16] arXiv:2405.06677 [pdf, ps, html, other]
-
Title: ATG: Benchmarking Automated Theorem Generation for Generative Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new or reusable theorems is still under-explored. Without the new theorems, current LMs struggle to prove harder theorems that are distant from the given hypotheses with the exponentially growing search space. Therefore, this paper proposes an Automated Theorem Generation (ATG) benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems that are applicable for downstream theorem proving as reusable knowledge. Specifically, we construct the ATG benchmark by splitting the Metamath library into three sets: axioms, library, and problem based on their proving depth. We conduct extensive experiments to investigate whether current LMs can generate theorems in the library and benefit the problem theorems proving. The results demonstrate that high-quality ATG data facilitates models' performances on downstream ATP. However, there is still room for current LMs to develop better ATG and generate more advanced and human-like theorems. We hope the new ATG challenge can shed some light on advanced complex theorem proving.
- [17] arXiv:2405.06680 [pdf, ps, html, other]
-
Title: Exploring the Compositional Deficiency of Large Language Models in Mathematical ReasoningSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Human cognition exhibits systematic compositionality, the algebraic ability to generate infinite novel combinations from finite learned components, which is the key to understanding and reasoning about complex logic. In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset \textsc{MathTrap}\footnotemark[3] by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8k. Since problems with logical flaws are quite rare in the real world, these represent ``unseen'' cases to LLMs. Solving these requires the models to systematically compose (1) the mathematical knowledge involved in the original problems with (2) knowledge related to the introduced traps. Our experiments show that while LLMs possess both components of requisite knowledge, they do not \textbf{spontaneously} combine them to handle these novel cases. We explore several methods to mitigate this deficiency, such as natural language prompts, few-shot demonstrations, and fine-tuning. We find that LLMs' performance can be \textbf{passively} improved through the above external intervention. Overall, systematic compositionality remains an open challenge for large language models.
- [18] arXiv:2405.06681 [pdf, ps, html, other]
-
Title: Leveraging Lecture Content for Improved Feedback: Explorations with GPT-4 and Retrieval Augmented GenerationComments: accepted at CSEE&T 2024: 36th International Conference on Software Engineering Education and Training, Würzburg, GermanySubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
This paper presents the use of Retrieval Augmented Generation (RAG) to improve the feedback generated by Large Language Models for programming tasks. For this purpose, corresponding lecture recordings were transcribed and made available to the Large Language Model GPT-4 as external knowledge source together with timestamps as metainformation by using RAG. The purpose of this is to prevent hallucinations and to enforce the use of the technical terms and phrases from the lecture. In an exercise platform developed to solve programming problems for an introductory programming lecture, students can request feedback on their solutions generated by GPT-4. For this task GPT-4 receives the students' code solution, the compiler output, the result of unit tests and the relevant passages from the lecture notes available through the use of RAG as additional context. The feedback generated by GPT-4 should guide students to solve problems independently and link to the lecture content, using the time stamps of the transcript as meta-information. In this way, the corresponding lecture videos can be viewed immediately at the corresponding positions. For the evaluation, students worked with the tool in a workshop and decided for each feedback whether it should be extended by RAG or not. First results based on a questionnaire and the collected usage data show that the use of RAG can improve feedback generation and is preferred by students in some situations. Due to the slower speed of feedback generation, the benefits are situation dependent.
- [19] arXiv:2405.06682 [pdf, ps, html, other]
-
Title: Self-Reflection in LLM Agents: Effects on Problem-Solving PerformanceSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In this study, we investigated the effects of self-reflection in large language models (LLMs) on problem-solving performance. We instructed nine popular LLMs to answer a series of multiple-choice questions to provide a performance baseline. For each incorrectly answered question, we instructed eight types of self-reflecting LLM agents to reflect on their mistakes and provide themselves with guidance to improve problem-solving. Then, using this guidance, each self-reflecting agent attempted to re-answer the same questions. Our results indicate that LLM agents are able to significantly improve their problem-solving performance through self-reflection ($p < 0.001$). In addition, we compared the various types of self-reflection to determine their individual contribution to performance. All code and data are available on GitHub at this https URL
- [20] arXiv:2405.06683 [pdf, ps, html, other]
-
Title: ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and PersonalizationComments: Draft PaperSubjects: Computation and Language (cs.CL)
Retrieval-augmented generation (RAG) for language models significantly improves language understanding systems. The basic retrieval-then-read pipeline of response generation has evolved into a more extended process due to the integration of various components, sometimes even forming loop structures. Despite its advancements in improving response accuracy, challenges like poor retrieval quality for complex questions that require the search of multifaceted semantic information, inefficiencies in knowledge re-retrieval during long-term serving, and lack of personalized responses persist. Motivated by transcending these limitations, we introduce ERAGent, a cutting-edge framework that embodies an advancement in the RAG area. Our contribution is the introduction of the synergistically operated module: Enhanced Question Rewriter and Knowledge Filter, for better retrieval quality. Retrieval Trigger is incorporated to curtail extraneous external knowledge retrieval without sacrificing response quality. ERAGent also personalizes responses by incorporating a learned user profile. The efficiency and personalization characteristics of ERAGent are supported by the Experiential Learner module which makes the AI assistant being capable of expanding its knowledge and modeling user profile incrementally. Rigorous evaluations across six datasets and three question-answering tasks prove ERAGent's superior accuracy, efficiency, and personalization, emphasizing its potential to advance the RAG field and its applicability in practical systems.
- [21] arXiv:2405.06684 [pdf, ps, other]
-
Title: QuakeBERT: Accurate Classification of Social Media Texts for Rapid Earthquake Impact AssessmentSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities, which few studies considered. To address the problem, this study proposes the first domain-specific LLM model and an integrated method for rapid earthquake impact assessment. First, a few categories are introduced to classify and filter microblogs considering their relationship to the physical and social impacts of earthquakes, and a dataset comprising 7282 earthquake-related microblogs from twenty earthquakes in different locations is developed as well. Then, with a systematic analysis of various influential factors, QuakeBERT, a domain-specific large language model (LLM), is developed and fine-tuned for accurate classification and filtering of microblogs. Meanwhile, an integrated method integrating public opinion trend analysis, sentiment analysis, and keyword-based physical impact quantification is introduced to assess both the physical and social impacts of earthquakes based on social media texts. Experiments show that data diversity and data volume dominate the performance of QuakeBERT and increase the macro average F1 score by 27%, while the best classification model QuakeBERT outperforms the CNN- or RNN-based models by improving the macro average F1 score from 60.87% to 84.33%. Finally, the proposed approach is applied to assess two earthquakes with the same magnitude and focal depth. Results show that the proposed approach can effectively enhance the impact assessment process by accurate detection of noisy microblogs, which enables effective post-disaster emergency responses to create more resilient cities.
- [22] arXiv:2405.06685 [pdf, ps, html, other]
-
Title: Multigenre AI-powered Story CompositionSubjects: Computation and Language (cs.CL)
This paper shows how to construct genre patterns, whose purpose is to guide interactive story composition in a way that enforces thematic consistency. To start the discussion we argue, based on previous seminal works, for the existence of five fundamental genres, namely comedy, romance - in the sense of epic plots, flourishing since the twelfth century -, tragedy, satire, and mystery. To construct the patterns, a simple two-phase process is employed: first retrieving examples that match our genre characterizations, and then applying a form of most specific generalization to the groups of examples in order to find their commonalities. In both phases, AI agents are instrumental, with our PatternTeller prototype being called to operate the story composition process, offering the opportunity to generate stories from a given premise of the user, to be developed under the guidance of the chosen pattern and trying to accommodate the user's suggestions along the composition stages.
- [23] arXiv:2405.06686 [pdf, ps, html, other]
-
Title: Word2World: Generating Stories and Worlds through Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) have proven their worth across a diverse spectrum of disciplines. LLMs have shown great potential in Procedural Content Generation (PCG) as well, but directly generating a level through a pre-trained LLM is still challenging. This work introduces Word2World, a system that enables LLMs to procedurally design playable games through stories, without any task-specific fine-tuning. Word2World leverages the abilities of LLMs to create diverse content and extract information. Combining these abilities, LLMs can create a story for the game, design narrative, and place tiles in appropriate places to create coherent worlds and playable games. We test Word2World with different LLMs and perform a thorough ablation study to validate each step. We open-source the code at this https URL.
- [24] arXiv:2405.06687 [pdf, ps, other]
-
Title: Hire Me or Not? Examining Language Model's Behavior with Occupation AttributesComments: Under reviewSubjects: Computation and Language (cs.CL)
With the impressive performance in various downstream tasks, large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems. A known issue of models trained on natural language data is the presence of human biases, which can impact the fairness of the system. This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making. Our framework is designed to investigate and quantify the presence of gender stereotypes in LLMs' behavior via multi-round question answering. Inspired by prior works, we construct a dataset by leveraging a standard occupation classification knowledge base released by authoritative agencies. We tested three LLMs (RoBERTa-large, GPT-3.5-turbo, and Llama2-70b-chat) and found that all models exhibit gender stereotypes analogous to human biases, but with different preferences. The distinct preferences of GPT-3.5-turbo and Llama2-70b-chat may imply the current alignment methods are insufficient for debiasing and could introduce new biases contradicting the traditional gender stereotypes.
- [25] arXiv:2405.06689 [pdf, ps, html, other]
-
Title: Policy Iteration for Pareto-Optimal Policies in Stochastic Stackelberg GamesComments: 21 pagesSubjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
In general-sum stochastic games, a stationary Stackelberg equilibrium (SSE) does not always exist, in which the leader maximizes leader's return for all the initial states when the follower takes the best response against the leader's policy. Existing methods of determining the SSEs require strong assumptions to guarantee the convergence and the coincidence of the limit with the SSE. Moreover, our analysis suggests that the performance at the fixed points of these methods is not reasonable when they are not SSEs. Herein, we introduced the concept of Pareto-optimality as a reasonable alternative to SSEs. We derive the policy improvement theorem for stochastic games with the best-response follower and propose an iterative algorithm to determine the Pareto-optimal policies based on it. Monotone improvement and convergence of the proposed approach are proved, and its convergence to SSEs is proved in a special case.
- [26] arXiv:2405.06691 [pdf, ps, html, other]
-
Title: Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle FilteringComments: 11 pages, 1 figure, 4 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Large language models (LLMs) have significantly evolved, moving from simple output generation to complex reasoning and from stand-alone usage to being embedded into broader frameworks. In this paper, we introduce \emph{Fleet of Agents (FoA)}, a novel framework utilizing LLMs as agents to navigate through dynamic tree searches, employing a genetic-type particle filtering approach. FoA spawns a multitude of agents, each exploring autonomously, followed by a selection phase where resampling based on a heuristic value function optimizes the balance between exploration and exploitation. This mechanism enables dynamic branching, adapting the exploration strategy based on discovered solutions. We experimentally validate FoA using two benchmark tasks, "Game of 24" and "Mini-Crosswords". FoA outperforms the previously proposed Tree-of-Thoughts method in terms of efficacy and efficiency: it significantly decreases computational costs (by calling the value function less frequently) while preserving comparable or even superior accuracy.
- [27] arXiv:2405.06692 [pdf, ps, html, other]
-
Title: Analyzing Language Bias Between French and English in Conventional Multilingual Sentiment Analysis ModelsComments: Undergraduate Research ProjectSubjects: Computation and Language (cs.CL)
Inspired by the 'Bias Considerations in Bilingual Natural Language Processing' report by Statistics Canada, this study delves into potential biases in multilingual sentiment analysis between English and French. Given a 50-50 dataset of French and English, we aim to determine if there exists a language bias and explore how the incorporation of more diverse datasets in the future might affect the equity of multilingual Natural Language Processing (NLP) systems. By employing Support Vector Machine (SVM) and Naive Bayes models on three balanced datasets, we reveal potential biases in multilingual sentiment classification. Utilizing Fairlearn, a tool for assessing bias in machine learning models, our findings indicate nuanced outcomes. With French data outperforming English across accuracy, recall, and F1 score metrics in both models, hinting at a language bias favoring French. However, Fairlearn's metrics suggest that the SVM approaches equitable levels with a demographic parity ratio of 0.963, 0.989, and 0.985 for the three separate datasets, indicating near-equitable treatment across languages. In contrast, Naive Bayes demonstrates greater disparities, evidenced by a demographic parity ratio of 0.813, 0.908, and 0.961. These findings reveal the importance of developing equitable multilingual NLP systems, particularly as we anticipate the inclusion of more datasets in various languages in the future.
- [28] arXiv:2405.06694 [pdf, ps, other]
-
Title: SUTRA: Scalable Multilingual Language Model ArchitectureSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In this paper, we introduce SUTRA, multilingual Large Language Model architecture capable of understanding, reasoning, and generating text in over 50 languages. SUTRA's design uniquely decouples core conceptual understanding from language-specific processing, which facilitates scalable and efficient multilingual alignment and learning. Employing a Mixture of Experts framework both in language and concept processing, SUTRA demonstrates both computational efficiency and responsiveness. Through extensive evaluations, SUTRA is demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on leading Massive Multitask Language Understanding (MMLU) benchmarks for multilingual tasks. SUTRA models are also online LLMs that can use knowledge from the internet to provide hallucination-free, factual and up-to-date responses while retaining their multilingual capabilities. Furthermore, we explore the broader implications of its architecture for the future of multilingual AI, highlighting its potential to democratize access to AI technology globally and to improve the equity and utility of AI in regions with predominantly non-English languages. Our findings suggest that SUTRA not only fills pivotal gaps in multilingual model capabilities but also establishes a new benchmark for operational efficiency and scalability in AI applications.
- [29] arXiv:2405.06695 [pdf, ps, other]
-
Title: Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural NetworksComments: Published in 2024 American Medical Informatics Association (AMIA) Summit March 18-21Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
An important issue impacting healthcare is a lack of available experts. Machine learning (ML) models could resolve this by aiding in diagnosing patients. However, creating datasets large enough to train these models is expensive. We evaluated large language models (LLMs) for data creation. Using Autism Spectrum Disorders (ASD), we prompted ChatGPT and GPT-Premium to generate 4,200 synthetic observations to augment existing medical data. Our goal is to label behaviors corresponding to autism criteria and improve model accuracy with synthetic training data. We used a BERT classifier pre-trained on biomedical literature to assess differences in performance between models. A random sample (N=140) from the LLM-generated data was evaluated by a clinician and found to contain 83% correct example-label pairs. Augmenting data increased recall by 13% but decreased precision by 16%, correlating with higher quality and lower accuracy across pairs. Future work will analyze how different synthetic data traits affect ML outcomes.
- [30] arXiv:2405.06696 [pdf, ps, html, other]
-
Title: Multi-level Shared Knowledge Guided Learning for Knowledge Graph CompletionComments: The paper has been accepted for publication at TACL. And the arXiv version is a pre-MIT Press publication versionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In the task of Knowledge Graph Completion (KGC), the existing datasets and their inherent subtasks carry a wealth of shared knowledge that can be utilized to enhance the representation of knowledge triplets and overall performance. However, no current studies specifically address the shared knowledge within KGC. To bridge this gap, we introduce a multi-level Shared Knowledge Guided learning method (SKG) that operates at both the dataset and task levels. On the dataset level, SKG-KGC broadens the original dataset by identifying shared features within entity sets via text summarization. On the task level, for the three typical KGC subtasks - head entity prediction, relation prediction, and tail entity prediction - we present an innovative multi-task learning architecture with dynamically adjusted loss weights. This approach allows the model to focus on more challenging and underperforming tasks, effectively mitigating the imbalance of knowledge sharing among subtasks. Experimental results demonstrate that SKG-KGC outperforms existing text-based methods significantly on three well-known datasets, with the most notable improvement on WN18RR.
- [31] arXiv:2405.06697 [pdf, ps, html, other]
-
Title: Automated Conversion of Static to Dynamic Scheduler via Natural LanguageComments: 7 pages (excluding appendix), 10 figures, 3 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In this paper, we explore the potential application of Large Language Models (LLMs) that will automatically model constraints and generate code for dynamic scheduling problems given an existing static model. Static scheduling problems are modelled and coded by optimization experts. These models may be easily obsoleted as the underlying constraints may need to be fine-tuned in order to reflect changes in the scheduling rules. Furthermore, it may be necessary to turn a static model into a dynamic one in order to cope with disturbances in the environment. In this paper, we propose a Retrieval-Augmented Generation (RAG) based LLM model to automate the process of implementing constraints for Dynamic Scheduling (RAGDyS), without seeking help from an optimization modeling expert. Our framework aims to minimize technical complexities related to mathematical modelling and computational workload for end-users, thereby allowing end-users to quickly obtain a new schedule close to the original schedule with changes reflected by natural language constraint descriptions.
- [32] arXiv:2405.06699 [pdf, ps, other]
-
Title: ChatSOS: Vector Database Augmented Generative Question Answering Assistant in Safety EngineeringSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
With the rapid advancement of natural language processing technologies, generative artificial intelligence techniques, represented by large language models (LLMs), are gaining increasing prominence and demonstrating significant potential for applications in safety engineering. However, fundamental LLMs face constraints such as limited training data coverage and unreliable responses. This study develops a vector database from 117 explosion accident reports in China spanning 2013 to 2023, employing techniques such as corpus segmenting and vector embedding. By utilizing the vector database, which outperforms the relational database in information retrieval quality, we provide LLMs with richer, more relevant knowledge. Comparative analysis of LLMs demonstrates that ChatSOS significantly enhances reliability, accuracy, and comprehensiveness, improves adaptability and clarification of responses. These results illustrate the effectiveness of supplementing LLMs with an external database, highlighting their potential to handle professional queries in safety engineering and laying a foundation for broader applications.
- [33] arXiv:2405.06701 [pdf, ps, other]
-
Title: Lightweight Spatial Modeling for Combinatorial Information Extraction From DocumentsYanfei Dong, Lambert Deng, Jiazheng Zhang, Xiaodong Yu, Ting Lin, Francesco Gelli, Soujanya Poria, Wee Sun LeeSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Documents that consist of diverse templates and exhibit complex spatial structures pose a challenge for document entity classification. We propose KNN-former, which incorporates a new kind of spatial bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We limit entities' attention only to their local radius defined by the KNN graph. We also use combinatorial matching to address the one-to-one mapping property that exists in many documents, where one field has only one corresponding entity. Moreover, our method is highly parameter-efficient compared to existing approaches in terms of the number of trainable parameters. Despite this, experiments across various datasets show our method outperforms baselines in most entity types. Many real-world documents exhibit combinatorial properties which can be leveraged as inductive biases to improve extraction accuracy, but existing datasets do not cover these documents. To facilitate future research into these types of documents, we release a new ID document dataset that covers diverse templates and languages. We also release enhanced annotations for an existing dataset.
- [34] arXiv:2405.06702 [pdf, ps, html, other]
-
Title: Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision TechniquesAbhinand K., Abhiram B. Nair, Dhananjay C., Hanan Hamza, Mohammed Fawaz J., Rahma Fahim K., Anoop V. SSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Technological advancements and innovations are advancing our daily life in all the ways possible but there is a larger section of society who are deprived of accessing the benefits due to their physical inabilities. To reap the real benefits and make it accessible to society, these talented and gifted people should also use such innovations without any hurdles. Many applications developed these days address these challenges, but localized communities and other constrained linguistic groups may find it difficult to use them. Malayalam, a Dravidian language spoken in the Indian state of Kerala is one of the twenty-two scheduled languages in India. Recent years have witnessed a surge in the development of systems and tools in Malayalam, addressing the needs of Kerala, but many of them are not empathetically designed to cater to the needs of hearing-impaired people. One of the major challenges is the limited or no availability of sign language data for the Malayalam language and sufficient efforts are not made in this direction. In this connection, this paper proposes an approach for sign language identification for the Malayalam language using advanced deep learning and computer vision techniques. We start by developing a labeled dataset for Malayalam letters and for the identification we use advanced deep learning techniques such as YOLOv8 and computer vision. Experimental results show that the identification accuracy is comparable to other sign language identification systems and other researchers in sign language identification can use the model as a baseline to develop advanced models.
- [35] arXiv:2405.06703 [pdf, ps, html, other]
-
Title: Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performanceSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In this paper, we introduce the Interpretable Cross-Examination Technique (ICE-T), a novel approach that leverages structured multi-prompt techniques with Large Language Models (LLMs) to improve classification performance over zero-shot and few-shot methods. In domains where interpretability is crucial, such as medicine and law, standard models often fall short due to their "black-box" nature. ICE-T addresses these limitations by using a series of generated prompts that allow an LLM to approach the problem from multiple directions. The responses from the LLM are then converted into numerical feature vectors and processed by a traditional classifier. This method not only maintains high interpretability but also allows for smaller, less capable models to achieve or exceed the performance of larger, more advanced models under zero-shot conditions. We demonstrate the effectiveness of ICE-T across a diverse set of data sources, including medical records and legal documents, consistently surpassing the zero-shot baseline in terms of classification metrics such as F1 scores. Our results indicate that ICE-T can be used for improving both the performance and transparency of AI applications in complex decision-making environments.
- [36] arXiv:2405.06704 [pdf, ps, html, other]
-
Title: Enhanced Review Detection and Recognition: A Platform-Agnostic Approach with Application to Online CommerceSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Online commerce relies heavily on user generated reviews to provide unbiased information about products that they have not physically seen. The importance of reviews has attracted multiple exploitative online behaviours and requires methods for monitoring and detecting reviews. We present a machine learning methodology for review detection and extraction, and demonstrate that it generalises for use across websites that were not contained in the training data. This method promises to drive applications for automatic detection and evaluation of reviews, regardless of their source. Furthermore, we showcase the versatility of our method by implementing and discussing three key applications for analysing reviews: Sentiment Inconsistency Analysis, which detects and filters out unreliable reviews based on inconsistencies between ratings and comments; Multi-language support, enabling the extraction and translation of reviews from various languages without relying on HTML scraping; and Fake review detection, achieved by integrating a trained NLP model to identify and distinguish between genuine and fake reviews.
- [37] arXiv:2405.06705 [pdf, ps, other]
-
Title: LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-ThoughtComments: To appear at IJCAI 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this challenge, we introduce a unique prompting strategy, termed the Pedagogical Chain-of-Thought (PedCoT), which is specifically designed to guide the identification of reasoning mistakes, particularly mathematical reasoning mistakes. PedCoT consists of pedagogical principles for prompts (PPP) design, two-stage interaction process (TIP) and grounded PedCoT prompts, all inspired by the educational theory of the Bloom Cognitive Model (BCM). We evaluate our approach on two public datasets featuring math problems of varying difficulty levels. The experiments demonstrate that our zero-shot prompting strategy significantly outperforms strong baselines. The proposed method can achieve the goal of reliable mathematical mistake identification and provide a foundation for automatic math answer grading. The results underscore the significance of educational theory, serving as domain knowledge, in guiding prompting strategy design for addressing challenging tasks with LLMs effectively.
- [38] arXiv:2405.06706 [pdf, ps, html, other]
-
Title: Exploring the Capabilities of Large Multimodal Models on Dense TextSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
While large multi-modal models (LMM) have shown notable progress in multi-modal tasks, their capabilities in tasks involving dense textual content remains to be fully explored. Dense text, which carries important information, is often found in documents, tables, and product descriptions. Understanding dense text enables us to obtain more accurate information, assisting in making better decisions. To further explore the capabilities of LMM in complex text tasks, we propose the DT-VQA dataset, with 170k question-answer pairs. In this paper, we conduct a comprehensive evaluation of GPT4V, Gemini, and various open-source LMMs on our dataset, revealing their strengths and weaknesses. Furthermore, we evaluate the effectiveness of two strategies for LMM: prompt engineering and downstream fine-tuning. We find that even with automatically labeled training datasets, significant improvements in model performance can be achieved. We hope that this research will promote the study of LMM in dense text tasks. Code will be released at this https URL.
- [39] arXiv:2405.06707 [pdf, ps, other]
-
Title: Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Combining different forms of prompts with pre-trained large language models has yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these methods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop \textit{Hypothesis Testing Prompting}, which adds conclusion assumptions, backward reasoning, and fact verification during intermediate reasoning steps. \textit{Hypothesis Testing prompting} involves multiple assumptions and reverses validation of conclusions leading to its unique correct answer. Experiments on two challenging deductive reasoning datasets ProofWriter and RuleTaker show that hypothesis testing prompting not only significantly improves the effect, but also generates a more reasonable and standardized reasoning process.
- [40] arXiv:2405.06709 [pdf, ps, html, other]
-
Title: Evaluating the Efficacy of AI Techniques in Textual Anonymization: A Comparative StudyDimitris Asimopoulos, Ilias Siniosoglou, Vasileios Argyriou, Sotirios K. Goudos, Konstantinos E. Psannis, Nikoleta Karditsioti, Theocharis Saoulidis, Panagiotis SarigiannidisSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In the digital era, with escalating privacy concerns, it's imperative to devise robust strategies that protect private data while maintaining the intrinsic value of textual information. This research embarks on a comprehensive examination of text anonymisation methods, focusing on Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), Embeddings from Language Models (ELMo), and the transformative capabilities of the Transformers architecture. Each model presents unique strengths since LSTM is modeling long-term dependencies, CRF captures dependencies among word sequences, ELMo delivers contextual word representations using deep bidirectional language models and Transformers introduce self-attention mechanisms that provide enhanced scalability. Our study is positioned as a comparative analysis of these models, emphasising their synergistic potential in addressing text anonymisation challenges. Preliminary results indicate that CRF, LSTM, and ELMo individually outperform traditional methods. The inclusion of Transformers, when compared alongside with the other models, offers a broader perspective on achieving optimal text anonymisation in contemporary settings.
- [41] arXiv:2405.06710 [pdf, ps, other]
-
Title: Mobile SequencersSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
The article is an attempt to contribute to explorations of a common origin for language and planned-collaborative action. It gives `semantics of change' the central stage in the synthesis, from its history and recordkeeping to its development, its syntax, delivery and reception, including substratal aspects.
It is suggested that to arrive at a common core, linguistic semantics must be understood as studying through syntax mobile agent's representing, tracking and coping with change and no change. Semantics of actions can be conceived the same way, but through plans instead of syntax. The key point is the following: Sequencing itself, of words and action sequences, brings in more structural interpretation to the sequence than which is immediately evident from the sequents themselves. Mobile sequencers can be understood as subjects structuring reporting, understanding and keeping track of change and no change. The idea invites rethinking of the notion of category, both in language and in planning.
Understanding understanding change by mobile agents is suggested to be about human extended practice, not extended-human practice. That's why linguistics is as important as computer science in the synthesis. It must rely on representational history of acts, thoughts and expressions, personal and public, crosscutting overtness and covertness of these phenomena. It has implication for anthropology in the extended practice, which is covered briefly. - [42] arXiv:2405.06712 [pdf, ps, other]
-
Title: Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common IllnessesComments: 14 pages, 4 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The recent swift development of LLMs like GPT-4, Gemini, and GPT-3.5 offers a transformative opportunity in medicine and healthcare, especially in digital diagnostics. This study evaluates each model diagnostic abilities by interpreting a user symptoms and determining diagnoses that fit well with common illnesses, and it demonstrates how each of these models could significantly increase diagnostic accuracy and efficiency. Through a series of diagnostic prompts based on symptoms from medical databases, GPT-4 demonstrates higher diagnostic accuracy from its deep and complete history of training on medical data. Meanwhile, Gemini performs with high precision as a critical tool in disease triage, demonstrating its potential to be a reliable model when physicians are trying to make high-risk diagnoses. GPT-3.5, though slightly less advanced, is a good tool for medical diagnostics. This study highlights the need to study LLMs for healthcare and clinical practices with more care and attention, ensuring that any system utilizing LLMs promotes patient privacy and complies with health information privacy laws such as HIPAA compliance, as well as the social consequences that affect the varied individuals in complex healthcare contexts. This study marks the start of a larger future effort to study the various ways in which assigning ethical concerns to LLMs task of learning from human biases could unearth new ways to apply AI in complex medical settings.
- [43] arXiv:2405.06713 [pdf, ps, other]
-
Title: Unveiling the Competitive Dynamics: A Comparative Evaluation of American and Chinese LLMsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The strategic significance of Large Language Models (LLMs) in economic expansion, innovation, societal development, and national security has been increasingly recognized since the advent of ChatGPT. This study provides a comprehensive comparative evaluation of American and Chinese LLMs in both English and Chinese contexts. We proposed a comprehensive evaluation framework that encompasses natural language proficiency, disciplinary expertise, and safety and responsibility, and systematically assessed 16 prominent models from the US and China under various operational tasks and scenarios. Our key findings show that GPT 4-Turbo is at the forefront in English contexts, whereas Ernie-Bot 4 stands out in Chinese contexts. The study also highlights disparities in LLM performance across languages and tasks, stressing the necessity for linguistically and culturally nuanced model development. The complementary strengths of American and Chinese LLMs point to the value of Sino-US collaboration in advancing LLM technology. The research presents the current LLM competition landscape and offers valuable insights for policymakers and businesses regarding strategic LLM investments and development. Future work will expand on this framework to include emerging LLM multimodal capabilities and business application assessments.
- [44] arXiv:2405.06714 [pdf, ps, other]
-
Title: Towards a path dependent account of category fluencyComments: To appear at CogSci 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Category fluency is a widely studied cognitive phenomenon, yet two conflicting accounts have been proposed as the underlying retrieval mechanism -- an optimal foraging process deliberately searching through memory (Hills et al., 2012) and a random walk sampling from a semantic network (Abbott et al., 2015). Evidence for both accounts has centered around predicting human patch switches, where both existing models of category fluency produce paradoxically identical results. We begin by peeling back the assumptions made by existing models, namely that each named example only depends on the previous example, by (i) adding an additional bias to model the category transition probability directly and (ii) relying on a large language model to predict based on the entire existing sequence. Then, we present evidence towards resolving the disagreement between each account of foraging by reformulating models as sequence generators. To evaluate, we compare generated category fluency runs to a bank of human-written sequences by proposing a metric based on n-gram overlap. We find category switch predictors do not necessarily produce human-like sequences, in fact the additional biases used by the Hills et al. (2012) model are required to improve generation quality, which are later improved by our category modification. Even generating exclusively with an LLM requires an additional global cue to trigger the patch switching behavior during production. Further tests on only the search process on top of the semantic network highlight the importance of deterministic search to replicate human behavior.
- [45] arXiv:2405.06715 [pdf, ps, html, other]
-
Title: Enhancing Creativity in Large Language Models through Associative Thinking StrategiesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This paper explores the enhancement of creativity in Large Language Models (LLMs) like vGPT-4 through associative thinking, a cognitive process where creative ideas emerge from linking seemingly unrelated concepts. Associative thinking strategies have been found to effectively help humans boost creativity. However, whether the same strategies can help LLMs become more creative remains under-explored. In this work, we investigate whether prompting LLMs to connect disparate concepts can augment their creative outputs. Focusing on three domains -- Product Design, Storytelling, and Marketing -- we introduce creativity tasks designed to assess vGPT-4's ability to generate original and useful content. By challenging the models to form novel associations, we evaluate the potential of associative thinking to enhance the creative capabilities of LLMs. Our findings show that leveraging associative thinking techniques can significantly improve the originality of vGPT-4's responses.
- [46] arXiv:2405.06719 [pdf, ps, html, other]
-
Title: Enhancing Traffic Prediction with Textual Data Using Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Traffic prediction is pivotal for rational transportation supply scheduling and allocation. Existing researches into short-term traffic prediction, however, face challenges in adequately addressing exceptional circumstances and integrating non-numerical contextual information like weather into models. While, Large language models offer a promising solution due to their inherent world knowledge. However, directly using them for traffic prediction presents drawbacks such as high cost, lack of determinism, and limited mathematical capability. To mitigate these issues, this study proposes a novel approach. Instead of directly employing large models for prediction, it utilizes them to process textual information and obtain embeddings. These embeddings are then combined with historical traffic data and inputted into traditional spatiotemporal forecasting models. The study investigates two types of special scenarios: regional-level and node-level. For regional-level scenarios, textual information is represented as a node connected to the entire network. For node-level scenarios, embeddings from the large model represent additional nodes connected only to corresponding nodes. This approach shows a significant improvement in prediction accuracy according to our experiment of New York Bike dataset.
- [47] arXiv:2405.06721 [pdf, ps, html, other]
-
Title: Kolmogorov-Arnold Networks are Radial Basis Function NetworksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
This short paper is a fast proof-of-concept that the 3-order B-splines used in Kolmogorov-Arnold Networks (KANs) can be well approximated by Gaussian radial basis functions. Doing so leads to FastKAN, a much faster implementation of KAN which is also a radial basis function (RBF) network.
- [48] arXiv:2405.06726 [pdf, ps, other]
-
Title: Region of Attraction Estimation for Free-Floating Systems under Time-Varying LQR ControlSubjects: Systems and Control (eess.SY)
Future Active Debris Removal (ADR) and On Orbit Servicing (OOS) missions demand for elaborate closed loop controllers. Feasible control architectures should take into consideration the inherent coupling of the free floating dynamics and the kinematics of the system. Recently, Time-Varying Linear Quadratic Regulators (TVLQR) have been used to stabilize underactuated systems that exhibit a similar kinodynamic coupling. Furthermore, this control approach integrates synergistically with Lyapunov based region of attraction (ROA) estimation, which, in the context of ADR and OOS, allows for reasoning about composability of different sub-maneuvers. In this paper, TVLQR was used to stabilize an ADR detumbling maneuver in simulation. Moreover, the ROA of the closed loop dynamics was estimated using a probabilistic method. In order to demonstrate the real-world applicability for free floating robots, further experiments were conducted onboard a free floating testbed.
- [49] arXiv:2405.06728 [pdf, ps, html, other]
-
Title: THERADIA WoZ: An Ecological Corpus for Appraisal-based Affect Research in HealthcareHippolyte Fournier, Sina Alisamir, Safaa Azzakhnini, Hanna Chainay, Olivier Koenig, Isabella Zsoldos, Eléeonore Trân, Gérard Bailly, Frédéeric Elisei, Béatrice Bouchot, Brice Varini, Patrick Constant, Joan Fruitet, Franck Tarpin-Bernard, Solange Rossato, François Portet, Fabien RingevalSubjects: Human-Computer Interaction (cs.HC)
We present THERADIA WoZ, an ecological corpus designed for audiovisual research on affect in healthcare. Two groups of senior individuals, consisting of 52 healthy participants and 9 individuals with Mild Cognitive Impairment (MCI), performed Computerised Cognitive Training (CCT) exercises while receiving support from a virtual assistant, tele-operated by a human in the role of a Wizard-of-Oz (WoZ). The audiovisual expressions produced by the participants were fully transcribed, and partially annotated based on dimensions derived from recent models of the appraisal theories, including novelty, intrinsic pleasantness, goal conduciveness, and coping. Additionally, the annotations included 23 affective labels drew from the literature of achievement affects. We present the protocols used for the data collection, transcription, and annotation, along with a detailed analysis of the annotated dimensions and labels. Baseline methods and results for their automatic prediction are also presented. The corpus aims to serve as a valuable resource for researchers in affective computing, and is made available to both industry and academia.
- [50] arXiv:2405.06730 [pdf, ps, other]
-
Title: Ocean-DC: An analysis ready data cube framework for environmental and climate change monitoring over the port areasSubjects: Databases (cs.DB); Atmospheric and Oceanic Physics (physics.ao-ph)
The environmental hazards and climate change effects causes serious problems in land and coastal areas. A solution to this problem can be the periodic monitoring over critical areas, like coastal region with heavy industrial activity (i.e., ship-buildings) or areas where a disaster (i.e., oil-spill) has occurred. Today there are several Earth and non-Earth Observation data available from several data providers. These data are huge in size and usually it is needed to combine several data from multiple sources (i.e., data with format differences) for a more effective evaluation. For addressing these issues, this work proposes the Ocean-DC framework as a solution in data harmonization and homogenization. A strong advantage of this Data Cube implementation is the generation of a single NetCDF product that contains Earth Observation data of several data types (i.e., Landsat-8 and Sentinel-2). To evaluate the effectiveness and efficiency of the Ocean-DC implementation, it is examined a case study of an oil-spill in Saronic gulf in September of 2017. The generated 4D Data Cube considers both Landsat-8,9 and Sentinel-2 products for a time-series analysis, before, during, and after the oil-spill event. The Ocean-DC framework successfully generated a NetCDF product, containing all the necessary remote sensing products for monitoring the oil-spill disaster in the Saronic gulf.
- [51] arXiv:2405.06747 [pdf, ps, html, other]
-
Title: Music Emotion Prediction Using Recurrent Neural NetworksComments: 15 pages, 13 figuresSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these categories. Our approach involves extracting a comprehensive set of audio features using Librosa and applying various recurrent neural network architectures, including standard RNNs, Bidirectional RNNs, and Long Short-Term Memory (LSTM) networks. Initial experiments are conducted using a dataset of 900 audio clips, labeled according to the emotional quadrants. We compare the performance of our neural network models against a set of baseline classifiers and analyze their effectiveness in capturing the temporal dynamics inherent in musical expression. The results indicate that simpler RNN architectures may perform comparably or even superiorly to more complex models, particularly in smaller datasets. We've also applied the following experiments on larger datasets: one is augmented based on our original dataset, and the other is from other sources. This research not only enhances our understanding of the emotional impact of music but also demonstrates the potential of neural networks in creating more personalized and emotionally resonant music recommendation and therapy systems.
- [52] arXiv:2405.06749 [pdf, ps, html, other]
-
Title: Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance EstimationVasileios Karampinis, Anastasios Arsenos, Orfeas Filippopoulos, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios VoulodimosSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.
- [53] arXiv:2405.06754 [pdf, ps, html, other]
-
Title: Wall-Street: Smart Surface-Enabled 5G mmWave for Roadside NetworkingComments: 15 pages, 22 figures, under submissionSubjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
5G mmWave roadside networks promise high-speed wireless connectivity, but face significant challenges in maintaining reliable connections for users moving at high speed. Frequent handovers, complex beam alignment, and signal attenuation due to obstacles like car bodies lead to service interruptions and degraded performance. We present Wall-Street, a smart surface installed on vehicles to enhance 5G mmWave connectivity for users inside. Wall-Street improves mobility management by (1) steering outdoor mmWave signals into the vehicle, ensuring coverage for all users; (2) enabling simultaneous serving cell data transfer and candidate handover cell measurement, allowing seamless handovers without service interruption; and (3) combining beams from source and target cells during a handover to increase reliability. Through its flexible and diverse signal manipulation capabilities, Wall-Street provides uninterrupted high-speed connectivity for latency-sensitive applications in challenging mobile environments. We have implemented and integrated Wall-Street in the COSMOS testbed and evaluated its real-time performance with four gNBs and a mobile client inside a surface-enabled vehicle, driving on a nearby road. Wall-Street achieves a 2.5-3.4x TCP throughput improvement and a 0.4-0.8x reduction in delay over a baseline 5G Standalone handover protocol.
- [54] arXiv:2405.06758 [pdf, ps, html, other]
-
Title: Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier DesignsSubjects: Machine Learning (cs.LG)
Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as it does not sufficiently optimize speed and area, resulting in a reduced processing rate and larger module size. To boost the arithmetic performance, in this work, we focus on the two most common and fundamental arithmetic modules: adders and multipliers. We cast the design tasks as single-player tree generation games, leveraging reinforcement learning techniques to optimize their arithmetic tree structures. Such a tree generation formulation allows us to efficiently navigate the vast search space and discover superior arithmetic designs that improve computational efficiency and hardware size within just a few hours. For adders, our approach discovers designs of 128-bit adders that achieve Pareto optimality in theoretical metrics. Compared with the state-of-the-art PrefixRL, our method decreases computational delay and hardware size by up to 26% and 30%, respectively. For multipliers, when compared to RL-MUL, our approach increases speed and reduces size by as much as 49% and 45%. Moreover, the inherent flexibility and scalability of our method enable us to deploy our designs into cutting-edge technologies, as we show that they can be seamlessly integrated into 7nm technology. We believe our work will offer valuable insights into hardware design, further accelerating speed and reducing size through the refined search space and our tree generation methodologies. See our introduction video at this https URL. Codes are released at this https URL.
- [55] arXiv:2405.06760 [pdf, ps, other]
-
Title: Opportunities for Persian Digital Humanities Research with Artificial Intelligence Language Models; Case Study: Forough FarrokhzadArash Rasti Meymandi, Zahra Hosseini, Sina Davari, Abolfazl Moshiri, Shabnam Rahimi-Golkhandan, Khashayar Namdar, Nikta Feizi, Mohamad Tavakoli-Targhi, Farzad KhalvatiSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This study explores the integration of advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) techniques to analyze and interpret Persian literature, focusing on the poetry of Forough Farrokhzad. Utilizing computational methods, we aim to unveil thematic, stylistic, and linguistic patterns in Persian poetry. Specifically, the study employs AI models including transformer-based language models for clustering of the poems in an unsupervised framework. This research underscores the potential of AI in enhancing our understanding of Persian literary heritage, with Forough Farrokhzad's work providing a comprehensive case study. This approach not only contributes to the field of Persian Digital Humanities but also sets a precedent for future research in Persian literary studies using computational techniques.
- [56] arXiv:2405.06761 [pdf, ps, other]
-
Title: Tree Proof-of-Position AlgorithmsSubjects: Data Structures and Algorithms (cs.DS)
We present a novel class of proof-of-position algorithms: Tree-Proof-of-Position (T-PoP). This algorithm is decentralised, collaborative and can be computed in a privacy preserving manner, such that agents do not need to reveal their position publicly. We make no assumptions of honest behaviour in the system, and consider varying ways in which agents may misbehave. Our algorithm is therefore resilient to highly adversarial scenarios. This makes it suitable for a wide class of applications, namely those in which trust in a centralised infrastructure may not be assumed, or high security risk scenarios. Our algorithm has a worst case quadratic runtime, making it suitable for hardware constrained IoT applications. We also provide a mathematical model that summarises T-PoP's performance for varying operating conditions. We then simulate T-PoP's behaviour with a large number of agent-based simulations, which are in complete agreement with our mathematical model, thus demonstrating its validity. T-PoP can achieve high levels of reliability and security by tuning its operating conditions, both in high and low density environments. Finally, we also present a mathematical model to probabilistically detect platooning attacks.
- [57] arXiv:2405.06762 [pdf, ps, html, other]
-
Title: LIVE: LaTex Interactive Visual EditingComments: 8 pages, double column, ieeeSubjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
LaTex coding is one of the main methods of writing an academic paper. When writing a paper, abundant proper visual or graphic components will represent more information volume than the textual data. However, most of the implementation of LaTex graphic items are designed as static items that have some weaknesses in representing more informative figures or tables with an interactive reading experience. To address this problem, we propose LIVE, a novel design methods idea to design interactive LaTex graphic items. To make a lucid representation of the main idea of LIVE, we designed several novels representing implementations that are interactive and enough explanation for the basic level principles. Using LIVE can design more graphic items, which we call the Gitems, and easily and automatically get the relationship of the mutual application of a specific range of papers, which will add more vitality and performance factors into writing of traditional papers especially the review papers. For vividly representing the functions of LIVE, we use the papers from NeRF as the example reference papers. The code of the implementation project is open source.
- [58] arXiv:2405.06765 [pdf, ps, html, other]
-
Title: Common Corruptions for Enhancing and Evaluating Robustness in Air-to-Air Visual Object DetectionAnastasios Arsenos, Vasileios Karampinis, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios VoulodimosSubjects: Computer Vision and Pattern Recognition (cs.CV)
The main barrier to achieving fully autonomous flights lies in autonomous aircraft navigation. Managing non-cooperative traffic presents the most important challenge in this problem. The most efficient strategy for handling non-cooperative traffic is based on monocular video processing through deep learning models. This study contributes to the vision-based deep learning aircraft detection and tracking literature by investigating the impact of data corruption arising from environmental and hardware conditions on the effectiveness of these methods. More specifically, we designed $7$ types of common corruptions for camera inputs taking into account real-world flight conditions. By applying these corruptions to the Airborne Object Tracking (AOT) dataset we constructed the first robustness benchmark dataset named AOT-C for air-to-air aerial object detection. The corruptions included in this dataset cover a wide range of challenging conditions such as adverse weather and sensor noise. The second main contribution of this letter is to present an extensive experimental evaluation involving $8$ diverse object detectors to explore the degradation in the performance under escalating levels of corruptions (domain shifts). Based on the evaluation results, the key observations that emerge are the following: 1) One-stage detectors of the YOLO family demonstrate better robustness, 2) Transformer-based and multi-stage detectors like Faster R-CNN are extremely vulnerable to corruptions, 3) Robustness against corruptions is related to the generalization ability of models. The third main contribution is to present that finetuning on our augmented synthetic data results in improvements in the generalisation ability of the object detector in real-world flight experiments.
- [59] arXiv:2405.06766 [pdf, ps, html, other]
-
Title: Dynamic Optimization of Proton Exchange Membrane Water Electrolyzers Considering Usage-Based DegradationComments: 61 pages, 19 figures, includes SISubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
We present a techno-economic optimization model for evaluating the design and operation of proton exchange membrane (PEM) electrolyzers, crucial for hydrogen production powered by variable renewable electricity. This model integrates a 0-D physics representation of the electrolyzer stack, complete mass and energy balances, operational constraints, and empirical data on use-dependent degradation. Utilizing a decomposition approach, the model predicts optimal electrolyzer size, operation, and necessary hydrogen storage to satisfy baseload demands across various technology and electricity price scenarios. Analysis for 2022 shows that including degradation effects raises the levelized cost of hydrogen from \$4.56/kg to \$6.60/kg and decreases stack life to two years. However, projections for 2030 anticipate a significant reduction in costs to approximately \$2.50/kg due to lower capital expenses, leading to larger stacks, extended lifetimes, and less hydrogen storage. This approach is adaptable to other electrochemical systems relevant for decarbonization.
- [60] arXiv:2405.06767 [pdf, ps, html, other]
-
Title: Color: A Framework for Applying Graph Coloring to Subgraph Cardinality EstimationSubjects: Databases (cs.DB)
Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality estimation methods which generally see catastrophic errors when estimating the size of queries with only a handful of joins. To overcome this, we propose COLOR, a framework for subgraph cardinality estimation which applies insights from graph compression theory to produce a compact summary that captures the global topology of the data graph. Further, we identify several key optimizations that enable tractable estimation over this summary even for large query graphs. We then evaluate several designs within this framework and find that they improve accuracy by up to 10$^3$x over all competing methods while maintaining fast inference, a small memory footprint, efficient construction, and graceful degradation under updates.
- [61] arXiv:2405.06770 [pdf, ps, html, other]
-
Title: Demonstrating Reinforcement Learning and Run Time Assurance for Spacecraft Inspection Using Unmanned Aerial VehiclesSubjects: Systems and Control (eess.SY)
On-orbit spacecraft inspection is an important capability for enabling servicing and manufacturing missions and extending the life of spacecraft. However, as space operations become increasingly more common and complex, autonomous control methods are needed to reduce the burden on operators to individually monitor each mission. In order for autonomous control methods to be used in space, they must exhibit safe behavior that demonstrates robustness to real world disturbances and uncertainty. In this paper, neural network controllers (NNCs) trained with reinforcement learning are used to solve an inspection task, which is a foundational capability for servicing missions. Run time assurance (RTA) is used to assure safety of the NNC in real time, enforcing several different constraints on position and velocity. The NNC and RTA are tested in the real world using unmanned aerial vehicles designed to emulate spacecraft dynamics. The results show this emulation is a useful demonstration of the capability of the NNC and RTA, and the algorithms demonstrate robustness to real world disturbances.
- [62] arXiv:2405.06771 [pdf, ps, html, other]
-
Title: Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control PoliciesKyle Dunlap, Nathaniel Hamilton, Francisco Viramontes, Derrek Landauer, Evan Kain, Kerianne L. HobbsSubjects: Systems and Control (eess.SY)
As the number of spacecraft on orbit continues to grow, it is challenging for human operators to constantly monitor and plan for all missions. Autonomous control methods such as reinforcement learning (RL) have the power to solve complex tasks while reducing the need for constant operator intervention. By combining RL solutions with run time assurance (RTA), safety of these systems can be assured in real time. However, in order to use these algorithms on board a spacecraft, they must be able to run in real time on space grade processors, which are typically outdated and less capable than state-of-the-art equipment. In this paper, multiple RL-trained neural network controllers (NNCs) and RTA algorithms were tested on commercial-off-the-shelf (COTS) and radiation tolerant processors. The results show that all NNCs and most RTA algorithms can compute optimal and safe actions in well under 1 second with room for further optimization before deploying in the real world.
- [63] arXiv:2405.06772 [pdf, ps, other]
-
Title: CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach vs. Expensive LLMComments: Published in 2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC), Conference Date: 07-09 February 2024Journal-ref: 2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC), Houston, TX, USA, 2024, pp. 1-12Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In today's digital landscape, where cyber attacks have become the norm, the detection of cyber attacks and threats is critically imperative across diverse domains. Our research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles, enhancing real-time vigilance for market stakeholders. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model, tailored for cyber categorization using a novel silver labeling approach powered by Random Forest. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. CANAL demonstrates superior performance by outperforming all other LLM counterparts in both accuracy and cost-effectiveness. Furthermore, we introduce the Cyber Signal Discovery module, a strategic component designed to efficiently detect emerging cyber signals from news articles. Collectively, CANAL and Cyber Signal Discovery module equip our framework to provide a robust and cost-effective solution for businesses that require agile responses to cyber intelligence.
- [64] arXiv:2405.06773 [pdf, ps, html, other]
-
Title: A Monotone Circuit Construction for Individually-Secure Multi-Secret SharingSubjects: Information Theory (cs.IT)
In this work, we introduce a new technique for taking a single-secret sharing scheme with a general access structure and transforming it into an individually secure multi-secret sharing scheme where every secret has the same general access structure. To increase the information rate, we consider Individual Security which guarantees zero mutual information with each secret individually, for any unauthorized subsets. Our approach involves identifying which shares of the single-secret sharing scheme can be replaced by linear combinations of messages. When $m-1$ shares are replaced, our scheme obtains an information rate of $m/|S|$, where $S$ is the set of shares. This provides an improvement over the information rate of $1/|S|$ in the original single-secret sharing scheme.
- [65] arXiv:2405.06778 [pdf, ps, html, other]
-
Title: Shape Conditioned Human Motion Generation with Diffusion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Human motion synthesis is an important task in computer graphics and computer vision. While focusing on various conditioning signals such as text, action class, or audio to guide the generation process, most existing methods utilize skeleton-based pose representation, requiring additional skinning to produce renderable meshes. Given that human motion is a complex interplay of bones, joints, and muscles, considering solely the skeleton for generation may neglect their inherent interdependency, which can limit the variability and precision of the generated results. To address this issue, we propose a Shape-conditioned Motion Diffusion model (SMD), which enables the generation of motion sequences directly in mesh format, conditioned on a specified target mesh. In SMD, the input meshes are transformed into spectral coefficients using graph Laplacian, to efficiently represent meshes. Subsequently, we propose a Spectral-Temporal Autoencoder (STAE) to leverage cross-temporal dependencies within the spectral domain. Extensive experimental evaluations show that SMD not only produces vivid and realistic motions but also achieves competitive performance in text-to-motion and action-to-motion tasks when compared to state-of-the-art methods.
- [66] arXiv:2405.06780 [pdf, ps, html, other]
-
Title: Deep MMD Gradient Flow without adversarial trainingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution, where the gradient field on the particles is given by a noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD). The noise-adaptive MMD is trained on data distributions corrupted by increasing levels of noise, obtained via a forward diffusion process, as commonly used in denoising diffusion probabilistic models. The result is a generalization of MMD Gradient Flow, which we call Diffusion-MMD-Gradient Flow or DMMD. The divergence training procedure is related to discriminator training in Generative Adversarial Networks (GAN), but does not require adversarial training. We obtain competitive empirical performance in unconditional image generation on CIFAR10, MNIST, CELEB-A (64 x64) and LSUN Church (64 x 64). Furthermore, we demonstrate the validity of the approach when MMD is replaced by a lower bound on the KL divergence.
- [67] arXiv:2405.06782 [pdf, ps, html, other]
-
Title: GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship GraphsMingyu Liu, Ekim Yurtsever, Marc Brede, Jun Meng, Walter Zimmer, Xingcheng Zhou, Bare Luka Zagar, Yuning Cui, Alois KnollSubjects: Computer Vision and Pattern Recognition (cs.CV)
Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object relation module, consisting of a graph generator and a graph neural network (GNN), to learn the spatial information from certain patterns to improve 3D object detection. Specifically, we create an inter-object relationship graph based on proposals in a frame via the graph generator to connect each proposal with its neighbor proposals. Afterward, the GNN module extracts edge features from the generated graph and iteratively refines proposal features with the captured edge features. Ultimately, we leverage the refined features as input to the detection head to obtain detection results. Our approach improves upon the baseline PV-RCNN on the KITTI validation set for the car class across easy, moderate, and hard difficulty levels by 0.82%, 0.74%, and 0.58%, respectively. Additionally, our method outperforms the baseline by more than 1% under the moderate and hard levels BEV AP on the test server.
- [68] arXiv:2405.06783 [pdf, ps, html, other]
-
Title: BLIP: Facilitating the Exploration of Undesirable Consequences of Digital TechnologiesComments: To appear in the Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USASubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Digital technologies have positively transformed society, but they have also led to undesirable consequences not anticipated at the time of design or development. We posit that insights into past undesirable consequences can help researchers and practitioners gain awareness and anticipate potential adverse effects. To test this assumption, we introduce BLIP, a system that extracts real-world undesirable consequences of technology from online articles, summarizes and categorizes them, and presents them in an interactive, web-based interface. In two user studies with 15 researchers in various computer science disciplines, we found that BLIP substantially increased the number and diversity of undesirable consequences they could list in comparison to relying on prior knowledge or searching online. Moreover, BLIP helped them identify undesirable consequences relevant to their ongoing projects, made them aware of undesirable consequences they "had never considered," and inspired them to reflect on their own experiences with technology.
- [69] arXiv:2405.06784 [pdf, ps, html, other]
-
Title: Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical HealthcareComments: 42 pagesSubjects: Machine Learning (cs.LG)
This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) for advancing biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforcement learning from human feedback, represent significant advancements in machine learning. These models, with their ability to generate coherent text and realistic images, are crucial for biomedical applications that require processing diverse data forms such as clinical reports, diagnostic images, and multimodal patient interactions.
The incorporation of FL with these sophisticated models presents a promising strategy to harness their analytical power while safeguarding the privacy of sensitive medical data. This approach not only enhances the capabilities of FMs in medical diagnostics and personalized treatment but also addresses critical concerns about data privacy and security in healthcare. This survey reviews the current applications of FMs in federated settings, underscores the challenges, and identifies future research directions including scaling FMs, managing data diversity, and enhancing communication efficiency within FL frameworks. The objective is to encourage further research into the combined potential of FMs and FL, laying the groundwork for groundbreaking healthcare innovations. - [70] arXiv:2405.06794 [pdf, ps, html, other]
-
Title: Site-dependent Solutions of Wave Energy Converter Farms with Surrogate Models, Control Co-design, and Layout OptimizationComments: 9 pages, 9 figuresSubjects: Systems and Control (eess.SY)
Design of wave energy converter farms entails multiple domains that are coupled, and thus, their concurrent representation and consideration in early-stage design optimization has the potential to offer new insights and promising solutions with improved performance. Concurrent optimization of physical attributes (e.g., plant) and the control system design is often known as control co-design or CCD. To further improve performance, the layout of the farm must be carefully optimized in order to ensure that constructive effects from hydrodynamic interactions are leveraged, while destructive effects are avoided. The variations in the joint probability distribution of waves, stemming from distinct site locations, affect the farm's performance and can potentially influence decisions regarding optimal plant selection, control strategies, and layout configurations. Therefore, this paper undertakes a concurrent exploration of control co-design and layout optimization for a farm comprising five devices, modeled as heaving cylinders in the frequency domain, situated across four distinct site locations: Alaskan Coasts, East Coast, Pacific Islands, and West Coast. The challenge of efficiently and accurately estimating hydrodynamic coefficients within the optimization loop was mitigated through the application of surrogate modeling and many-body expansion principles. Results indicate the optimized solutions exhibit variations in plant, control, and layout for each candidate site, signifying the importance of system-level design with environmental considerations from the early stages of the design process.
- [71] arXiv:2405.06797 [pdf, ps, html, other]
-
Title: Exponential Lower Bounds on the Double Oracle Algorithm in Zero-Sum GamesSubjects: Computer Science and Game Theory (cs.GT)
The double oracle algorithm is a popular method of solving games, because it is able to reduce computing equilibria to computing a series of best responses. However, its theoretical properties are not well understood. In this paper, we provide exponential lower bounds on the performance of the double oracle algorithm in both partially-observable stochastic games (POSGs) and extensive-form games (EFGs). Our results depend on what is assumed about the tiebreaking scheme -- that is, which meta-Nash equilibrium or best response is chosen, in the event that there are multiple to pick from. In particular, for EFGs, our lower bounds require adversarial tiebreaking, whereas for POSGs, our lower bounds apply regardless of how ties are broken.
- [72] arXiv:2405.06800 [pdf, ps, html, other]
-
Title: LLM-Generated Black-box Explanations Can Be Adversarially HelpfulSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) are becoming vital tools that help us solve and understand complex problems by acting as digital assistants. LLMs can generate convincing explanations, even when only given the inputs and outputs of these problems, i.e., in a ``black-box'' approach. However, our research uncovers a hidden risk tied to this approach, which we call *adversarial helpfulness*. This happens when an LLM's explanations make a wrong answer look right, potentially leading people to trust incorrect solutions. In this paper, we show that this issue affects not just humans, but also LLM evaluators. Digging deeper, we identify and examine key persuasive strategies employed by LLMs. Our findings reveal that these models employ strategies such as reframing the questions, expressing an elevated level of confidence, and cherry-picking evidence to paint misleading answers in a credible light. To examine if LLMs are able to navigate complex-structured knowledge when generating adversarially helpful explanations, we create a special task based on navigating through graphs. Some LLMs are not able to find alternative paths along simple graphs, indicating that their misleading explanations aren't produced by only logical deductions using complex knowledge. These findings shed light on the limitations of black-box explanation setting. We provide some advice on how to use LLMs as explainers safely.
- [73] arXiv:2405.06801 [pdf, ps, html, other]
-
Title: LEO Satellite Network Access in the Wild: Potentials, Experiences, and ChallengesSami Ma (1), Yi Ching Chou (1), Miao Zhang (1), Hao Fang (1), Haoyuan Zhao (1), Jiangchuan Liu (1), William I. Atlas (2) ((1) Simon Fraser University, (2) Pacific Salmon Foundation)Comments: 8 pages, 6 figuresSubjects: Networking and Internet Architecture (cs.NI)
In the past three years, working with the Pacific Salmon Foundation and various First Nations groups, we have established Starlink-empowered wild salmon monitoring sites in remote Northern British Columbia, Canada. We report our experiences with the network services in these challenging environments, including deep woods and deep valleys, that lack infrastructural support with some close to Starlink's service boundary at the far north. We assess the portability and mobility of the satellite dishes and the quality of existing network access in underdeveloped countries that Starlink expects to cover. Our experiences suggest that network access based on LEO satellite constellations holds promise but faces hurdles such as energy supply constraints and environmental factors like temperature, precipitation, and solar storms. The presence of wildlife and respecting local residents' culture and heritage pose further complications. We envision several technical solutions addressing the challenges and believe that further regulations will be necessary.
- [74] arXiv:2405.06802 [pdf, ps, other]
-
Title: Summarizing Radiology Reports Findings into ImpressionsComments: 10 pages, 6 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Patient hand-off and triage are two fundamental problems in health care. Often doctors must painstakingly summarize complex findings to efficiently communicate with specialists and quickly make decisions on which patients have the most urgent cases. In pursuit of these challenges, we present (1) a model with state-of-art radiology report summarization performance using (2) a novel method for augmenting medical data, and (3) an analysis of the model limitations and radiology knowledge gain. We also provide a data processing pipeline for future models developed on the the MIMIC CXR dataset. Our best performing model was a fine-tuned BERT-to-BERT encoder-decoder with 58.75/100 ROUGE-L F1, which outperformed specialized checkpoints with more sophisticated attention mechanisms. We investigate these aspects in this work.
- [75] arXiv:2405.06804 [pdf, ps, html, other]
-
Title: Time-of-arrival Estimation and Phase Unwrapping of Head-related Transfer Functions With Integer Linear ProgrammingComments: Accepted to be presented at Audio Engineering Society 156th Convention, 2024 June, Madrid, SpainSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
In binaural audio synthesis, aligning head-related impulse responses (HRIRs) in time has been an important pre-processing step, enabling accurate spatial interpolation and efficient data compression. The maximum correlation time delay between spatially nearby HRIRs has previously been used to get accurate and smooth alignment by solving a matrix equation in which the solution has the minimum Euclidean distance to the time delay. However, the Euclidean criterion could lead to an over-smoothing solution in practice. In this paper, we solve the smoothing issue by formulating the task as solving an integer linear programming problem equivalent to minimising an $L^1$-norm. Moreover, we incorporate 1) the cross-correlation of inter-aural HRIRs, and 2) HRIRs with their minimum-phase responses to have more reference measurements for optimisation. We show the proposed method can get more accurate alignments than the Euclidean-based method by comparing the spectral reconstruction loss of time-aligned HRIRs using spherical harmonics representation on seven HRIRs consisting of human and dummy heads. The extra correlation features and the $L^1$-norm are also beneficial in extremely noisy conditions. In addition, this method can be applied to phase unwrapping of head-related transfer functions, where the unwrapped phase could be a compact feature for downstream tasks.
- [76] arXiv:2405.06805 [pdf, ps, html, other]
-
Title: A (Weakly) Polynomial Algorithm for AIVF CodingComments: Expanded version of paper appearing on ISIT 2024Subjects: Data Structures and Algorithms (cs.DS)
It is possible to improve upon Tunstall coding using a collection of multiple parse trees. The best such results so far are Iwata and Yamamoto's maximum cost AIVF codes. The most efficient algorithm for designing such codes is an iterative one that could run in exponential time. In this paper, we show that this problem fits into the framework of a newly developed technique that uses linear programming with the Ellipsoid method to solve the minimum cost Markov chain problem. This permits constructing maximum cost AIVF codes in (weakly) polynomial time.
- [77] arXiv:2405.06806 [pdf, ps, html, other]
-
Title: An Empirical Study on the Effectiveness of Large Language Models for SATD Identification and ClassificationComments: This is the preprint version of a paper that has been submitted to Empirical Software EngineeringSubjects: Software Engineering (cs.SE)
Self-Admitted Technical Debt (SATD), a concept highlighting sub-optimal choices in software development documented in code comments or other project resources, poses challenges in the maintainability and evolution of software systems. Large language models (LLMs) have demonstrated significant effectiveness across a broad range of software tasks, especially in software text generation tasks. Nonetheless, their effectiveness in tasks related to SATD is still under-researched. In this paper, we investigate the efficacy of LLMs in both identification and classification of SATD. For both tasks, we investigate the performance gain from using more recent LLMs, specifically the Flan-T5 family, across different common usage settings. Our results demonstrate that for SATD identification, all fine-tuned LLMs outperform the best existing non-LLM baseline, i.e., the CNN model, with a 4.4% to 7.2% improvement in F1 score. In the SATD classification task, while our largest fine-tuned model, Flan-T5-XL, still led in performance, the CNN model exhibited competitive results, even surpassing four of six LLMs. We also found that the largest Flan-T5 model, i.e., Flan-T5-XXL, when used with a zero-shot in-context learning (ICL) approach for SATD identification, provides competitive results with traditional approaches but performs 6.4% to 9.2% worse than fine-tuned LLMs. For SATD classification, few-shot ICL approach, incorporating examples and category descriptions in prompts, outperforms the zero-shot approach and even surpasses the fine-tuned smaller Flan-T5 models. Moreover, our experiments demonstrate that incorporating contextual information, such as surrounding code, into the SATD classification task enables larger fine-tuned LLMs to improve their performance.
- [78] arXiv:2405.06807 [pdf, ps, html, other]
-
Title: Tackling Execution-Based Evaluation for NL2BashSubjects: Computation and Language (cs.CL); Software Engineering (cs.SE)
Given recent advancement of Large Language Models (LLMs), the task of translating from natural language prompts to different programming languages (code generation) attracts immense attention for wide application in different domains. Specially code generation for Bash (NL2Bash) is widely used to generate Bash scripts for automating different tasks, such as performance monitoring, compilation, system administration, system diagnostics, etc. Besides code generation, validating synthetic code is critical before using them for any application. Different methods for code validation are proposed, both direct (execution evaluation) and indirect validations (i.e. exact/partial match, BLEU score). Among these, Execution-based Evaluation (EE) can validate the predicted code by comparing the execution output of model prediction and expected output in system. However, designing and implementing such an execution-based evaluation system for NL2Bash is not a trivial task. In this paper, we present a machinery for execution-based evaluation for NL2Bash. We create a set of 50 prompts to evaluate some popular LLMs for NL2Bash. We also analyze several advantages and challenges of EE such as syntactically different yet semantically equivalent Bash scripts generated by different LLMs, or syntactically correct but semantically incorrect Bash scripts, and how we capture and process them correctly.
- [79] arXiv:2405.06811 [pdf, ps, html, other]
-
Title: Shared Virtual Memory: Its Design and Performance Implications for Diverse ApplicationsComments: To be published in ICS '24Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Discrete GPU accelerators, while providing massive computing power for supercomputers and data centers, have their separate memory domain. Explicit memory management across device and host domains in programming is tedious and error-prone. To improve programming portability and productivity, Unified Memory (UM) integrates GPU memory into the host virtual memory systems, and provides transparent data migration between them and GPU memory oversubscription. Nevertheless, current UM technologies cause significant performance loss for applications. With AMD GPUs increasingly being integrated into the world's leading supercomputers, it is necessary to understand their Shared Virtual Memory (SVM) and mitigate the performance impacts. In this work, we delve into the SVM design, examine its interactions with applications' data accesses at fine granularity, and quantitatively analyze its performance effects on various applications and identify the performance bottlenecks. Our research reveals that SVM employs an aggressive prefetching strategy for demand paging. This prefetching is efficient when GPU memory is not oversubscribed. However, in tandem with the eviction policy, it causes excessive thrashing and performance degradation for certain applications under oversubscription. We discuss SVM-aware algorithms and SVM design changes to mitigate the performance impacts. To the best of our knowledge, this work is the first in-depth and comprehensive study for SVM technologies.
- [80] arXiv:2405.06814 [pdf, ps, html, other]
-
Title: Dual-Task Vision Transformer for Rapid and Accurate Intracerebral Hemorrhage Classification on CT ImagesComments: 9 pages, 4 figure3Subjects: Computer Vision and Pattern Recognition (cs.CV)
Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we built a dataset for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a dual-task vision transformer (DTViT) for the automated classification and diagnosis of ICH images. This neural network utilizes the encoder from ViT, employing attention mechanisms for feature extraction from CT images. We incorporated two multilayer perception (MLP)-based decoders within the network to simultaneously identify the presence of ICH and classify three types of hemorrhage locations. Experimental results demonstrate that our proposed multi-classification network performs well on the built real-world test dataset. The code and dataset for this study will be made publicly available upon paper acceptance at: this https URL.
- [81] arXiv:2405.06816 [pdf, ps, html, other]
-
Title: Non-stationary Domain Generalization: Theory and AlgorithmComments: Accepted by UAI 2024Subjects: Machine Learning (cs.LG)
Although recent advances in machine learning have shown its success to learn from independent and identically distributed (IID) data, it is vulnerable to out-of-distribution (OOD) data in an open world. Domain generalization (DG) deals with such an issue and it aims to learn a model from multiple source domains that can be generalized to unseen target domains. Existing studies on DG have largely focused on stationary settings with homogeneous source domains. However, in many applications, domains may evolve along a specific direction (e.g., time, space). Without accounting for such non-stationary patterns, models trained with existing methods may fail to generalize on OOD data. In this paper, we study domain generalization in non-stationary environment. We first examine the impact of environmental non-stationarity on model performance and establish the theoretical upper bounds for the model error at target domains. Then, we propose a novel algorithm based on adaptive invariant representation learning, which leverages the non-stationary pattern to train a model that attains good performance on target domains. Experiments on both synthetic and real data validate the proposed algorithm.
- [82] arXiv:2405.06817 [pdf, ps, html, other]
-
Title: Coherent Design of Wind Turbine Controllers Considering Transitions between Operating Regions using Fuzzy Membership FunctionsSubjects: Systems and Control (eess.SY)
This paper presents a coherent design of wind turbine controllers with explicit consideration of transitions between operating regions by fuzzy membership functions. In improving the design process of wind turbines, the transitions between partial-load operation by torque control and full-load operation by pitch control need to be systematically considered. From the first view, fuzzy methods for blending separately designed control laws are an obvious choice. However, valid design rules must be developed to ensure stability and performance during the transition. A model-based control design procedure in the Takagi-Sugeno fuzzy framework using the sector nonlinearity method is proposed to achieve the above control design objectives. In addition to a detailed mathematical analysis of the design, the method's applicability is verified by simulation studies using a high-fidelity reference wind turbine model.
- [83] arXiv:2405.06818 [pdf, ps, html, other]
-
Title: The Ghanaian NLP Landscape: A First LookSubjects: Computation and Language (cs.CL)
Despite comprising one-third of global languages, African languages are critically underrepresented in Artificial Intelligence (AI), threatening linguistic diversity and cultural heritage. Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk. This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages, identifying methodologies, datasets, and techniques employed. Additionally, we create a detailed roadmap outlining challenges, best practices, and future directions, aiming to improve accessibility for researchers. This work serves as a foundational resource for Ghanaian NLP research and underscores the critical need for integrating global linguistic diversity into AI development.
- [84] arXiv:2405.06821 [pdf, ps, html, other]
-
Title: Synchronized Object Detection for Autonomous Sorting, Mapping, and Quantification of Medical MaterialsComments: To be submittedSubjects: Computer Vision and Pattern Recognition (cs.CV)
The circular economy paradigm is gaining interest as a solution to reduce both material supply uncertainties and waste generation. One of the main challenges is monitoring materials, since in general, something that is not measured cannot be effectively managed. In this paper, we propose real-time synchronized object detection to enable, at the same time, autonomous sorting, mapping, and quantification of end-of-life medical materials. Dataset, code, and demo videos are publicly available.
- [85] arXiv:2405.06822 [pdf, ps, html, other]
-
Title: MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data AnalysisComments: This paper is accepted by ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Federated learning is widely used in medical applications for training global models without needing local data access. However, varying computational capabilities and network architectures (system heterogeneity), across clients pose significant challenges in effectively aggregating information from non-independently and identically distributed (non-IID) data. Current federated learning methods using knowledge distillation require public datasets, raising privacy and data collection issues. Additionally, these datasets require additional local computing and storage resources, which is a burden for medical institutions with limited hardware conditions. In this paper, we introduce a novel federated learning paradigm, named Model Heterogeneous personalized Federated Learning via Injection and Distillation (MH-pFLID). Our framework leverages a lightweight messenger model that carries concentrated information to collect the information from each client. We also develop a set of receiver and transmitter modules to receive and send information from the messenger model, so that the information could be injected and distilled with efficiency.
- [86] arXiv:2405.06823 [pdf, ps, html, other]
-
Title: PLeak: Prompt Leaking Attacks against Large Language Model ApplicationsComments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large Language Models (LLMs) enable a new ecosystem with many downstream applications, called LLM applications, with different natural language processing tasks. The functionality and performance of an LLM application highly depend on its system prompt, which instructs the backend LLM on what task to perform. Therefore, an LLM application developer often keeps a system prompt confidential to protect its intellectual property. As a result, a natural attack, called prompt leaking, is to steal the system prompt from an LLM application, which compromises the developer's intellectual property. Existing prompt leaking attacks primarily rely on manually crafted queries, and thus achieve limited effectiveness.
In this paper, we design a novel, closed-box prompt leaking attack framework, called PLeak, to optimize an adversarial query such that when the attacker sends it to a target LLM application, its response reveals its own system prompt. We formulate finding such an adversarial query as an optimization problem and solve it with a gradient-based method approximately. Our key idea is to break down the optimization goal by optimizing adversary queries for system prompts incrementally, i.e., starting from the first few tokens of each system prompt step by step until the entire length of the system prompt.
We evaluate PLeak in both offline settings and for real-world LLM applications, e.g., those on Poe, a popular platform hosting such applications. Our results show that PLeak can effectively leak system prompts and significantly outperforms not only baselines that manually curate queries but also baselines with optimized queries that are modified and adapted from existing jailbreaking attacks. We responsibly reported the issues to Poe and are still waiting for their response. Our implementation is available at this repository: this https URL. - [87] arXiv:2405.06826 [pdf, ps, other]
-
Title: A Nominal Approach to Probabilistic Separation LogicSubjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
Currently, there is a gap between the tools used by probability theorists and those used in formal reasoning about probabilistic programs. On the one hand, a probability theorist decomposes probabilistic state along the simple and natural product of probability spaces. On the other hand, recently developed probabilistic separation logics decompose state via relatively unfamiliar measure-theoretic constructions for computing unions of sigma-algebras and probability measures. We bridge the gap between these two perspectives by showing that these two methods of decomposition are equivalent up to a suitable equivalence of categories. Our main result is a probabilistic analog of the classic equivalence between the category of nominal sets and the Schanuel topos. Through this equivalence, we validate design decisions in prior work on probabilistic separation logic and create new connections to nominal-set-like models of probability.
- [88] arXiv:2405.06827 [pdf, ps, html, other]
-
Title: Acceleration of Power System Dynamic Simulations using a Deep Equilibrium Layer and Neural ODE SurrogateMatthew Bossart, Jose Daniel Lara, Ciaran Roberts, Rodrigo Henriquez-Auba, Duncan Callaway, Bri-Mathias HodgeComments: This work has been submitted to the IEEE Transactions on Energy Conversion for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Systems and Control (eess.SY)
The dominant paradigm for power system dynamic simulation is to build system-level simulations by combining physics-based models of individual components. The sheer size of the system along with the rapid integration of inverter-based resources exacerbates the computational burden of running time domain simulations. In this paper, we propose a data-driven surrogate model based on implicit machine learning -- specifically deep equilibrium layers and neural ordinary differential equations -- to learn a reduced order model of a portion of the full underlying system. The data-driven surrogate achieves similar accuracy and reduction in simulation time compared to a physics-based surrogate, without the constraint of requiring detailed knowledge of the underlying dynamic models. This work also establishes key requirements needed to integrate the surrogate into existing simulation workflows; the proposed surrogate is initialized to a steady state operating point that matches the power flow solution by design.
- [89] arXiv:2405.06828 [pdf, ps, html, other]
-
Title: G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part GroupingComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a novel task named "3D part grouping". Suppose there is a mixed set containing scattered parts from various shapes. This task requires algorithms to find out every possible combination among all the parts. To address this challenge, we propose the so called Gradient Field-based Auto-Regressive Sampling framework (G-FARS) tailored specifically for the 3D part grouping task. In our framework, we design a gradient-field-based selection graph neural network (GNN) to learn the gradients of a log conditional probability density in terms of part selection, where the condition is the given mixed part set. This innovative approach, implemented through the gradient-field-based selection GNN, effectively captures complex relationships among all the parts in the input. Upon completion of the training process, our framework becomes capable of autonomously grouping 3D parts by iteratively selecting them from the mixed part set, leveraging the knowledge acquired by the trained gradient-field-based selection GNN. Our code is available at: this https URL.
- [90] arXiv:2405.06829 [pdf, ps, html, other]
-
Title: Model Reference Control for Wind Turbine Systems in Full Load Region based on Takagi-Sugeno Fuzzy SystemsComments: 9 pages, 6 figuresSubjects: Systems and Control (eess.SY)
This paper presents a novel Model Reference Control (MRC) approach for wind turbine (WT) systems in the full load region employing a fuzzy Parallel Distribution Compensation Controller (PDC-C) derived using a Takagi-Sugeno (TS) fuzzy System approach. Through first-order Taylor series expansion, local linear submodels are generated and combined via triangular membership functions to develop a TS descriptor model. From here, the MRC PDC-C is synthesized by a constrained LMI optimization procedure, including damping characteristics of the elastic drive train, to track the desired rotor speed and generator torque based on the reference model dynamics. The controller is tested on the nonlinear WT model in simulation studies under various wind conditions, such as turbulent wind, wind gusts, and a Fault Ride Through (FRT) scenario where the generator torque is set to 0 p.u. for 150 ms.
- [91] arXiv:2405.06830 [pdf, ps, html, other]
-
Title: Towards Browser Controls to Protect Cookies from Malicious ExtensionsSubjects: Cryptography and Security (cs.CR)
Cookies provide a state management mechanism for the web and are often used for authentication, storing a user's session ID, and replacing their credentials in subsequent requests. These ``session cookies'' are valuable targets of attacks such as Session Hijacking and Fixation that attempt to steal them and gain unauthorized access to user accounts. Multiple controls such as the Secure and HttpOnly cookie attributes restrict cookie accessibility, effectively mitigating attacks from the network or malicious websites, but often ignoring untrusted extensions within the user's browser. Extensions are third-party HTML/JavaScript add-ons with access to several privileged APIs and can run on multiple websites at once. Unfortunately, this can provide malicious/compromised extensions with unrestricted access to session cookies.
In this work, we first conduct a study assessing the prevalence of extensions with these ``risky'' APIs (i.e., those enabling cookie modification and theft) and find that they are currently used by hundreds of millions of users. Motivated by this, we propose browser controls based on two new cookie attributes that protect cookies from malicious extensions: BrowserOnly and Tracked. The BrowserOnly attribute prevents accessing cookies from extensions altogether. While effective, not all cookies can be inaccessible. Cookies with the Tracked attribute remain accessible, are tied to a single browser, and record any modifications made by extensions. Thus, stolen Tracked cookies become unusable outside their original browser and servers can verify any modifications. To demonstrate these features' practicality, we implement CREAM (Cookie Restrictions for Extension Abuse Mitigation): a modified version of Chromium realizing these controls. Our evaluation indicates that CREAM controls effectively protect cookies from malicious extensions while incurring small run-time overheads. - [92] arXiv:2405.06831 [pdf, ps, html, other]
-
Title: Better Algorithms for Constructing Minimum Cost Markov Chains and AIFV CodesComments: Expanded version of paper appearing in ISIT 2024Subjects: Data Structures and Algorithms (cs.DS)
The problem of constructing optimal AIFV codes is a special case of that of constructing minimum cost Markov Chains. This paper provides the first complete proof of correctness for the previously known iterative algorithm for constructing such Markov chains.
A recent work describes how to efficiently solve the Markov Chain problem by first constructing a Markov Chain Polytope and then running the Ellipsoid algorithm for linear programming on it. This paper's second result is that, in the AIFV case, a special property of the polytope instead permits solving the corresponding linear program using simple binary search - [93] arXiv:2405.06832 [pdf, ps, html, other]
-
Title: Concolic Testing of JavaScript using SparkplugSubjects: Software Engineering (cs.SE)
JavaScript is prevalent in web and server apps, handling sensitive data. JS testing methods lag behind other languages. Insitu concolic testing for JS is effective but slow and complex. Our method enhances tracing with V8 Sparkplug baseline compiler and remill libraries for assembly to LLVM IR conversion. Evaluation on 160 Node.js libraries reveals comparable coverage and bug detection in significantly less time than the in-situ method.
- [94] arXiv:2405.06835 [pdf, ps, html, other]
-
Title: Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMsHarsh Patel, Buvaneswari A. Ramanan, Manzoor A. Khan, Thomas Williams, Brian Friedman, Lawrence DrabeckComments: The work was completed during 2Q, 3Q of Year 2023, when WizardCoder was the top performing Open source LLM for coding. Newer and better models have emerged since then. The processes and methodologies utilized for this benchmarking can still be utilized for evaluating the current SoTA modelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
This paper explores the possibilities of the current generation of Large Language Models for incorporating Machine Learning Operations (MLOps) functionalities into ML training code bases. We evaluate the performance of OpenAI (gpt-3.5-turbo) and WizardCoder (open-source, 15B parameters) models on the automated accomplishment of various MLOps functionalities in different settings. We perform a benchmarking study that assesses the ability of these models to: (1) adapt existing code samples (Inlining) with component-specific MLOps functionality such as MLflow and Weights & Biases for experiment tracking, Optuna for hyperparameter optimization etc., and (2) perform the task of Translation from one component of an MLOps functionality to another, e.g., translating existing GitPython library based version control code to Data Version Control library based. We also propose three different approaches that involve teaching LLMs to comprehend the API documentation of the components as a reference while accomplishing the Translation tasks. In our evaluations, the gpt-3.5-turbo model significantly outperforms WizardCoder by achieving impressive Pass@3 accuracy in model optimization (55% compared to 0% by WizardCoder), experiment tracking (100%, compared to 62.5% by WizardCoder), model registration (92% compared to 42% by WizardCoder) and hyperparameter optimization (83% compared to 58% by WizardCoder) on average, in their best possible settings, showcasing its superior code adaptability performance in complex MLOps tasks.
- [95] arXiv:2405.06840 [pdf, ps, html, other]
-
Title: MEIC: Re-thinking RTL Debug Automation using LLMsSubjects: Hardware Architecture (cs.AR); Software Engineering (cs.SE)
The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces a novel framework, Make Each Iteration Count (MEIC), which contrasts with traditional one-shot LLM-based debugging methods that heavily rely on prompt engineering, model tuning, and model training. MEIC utilises LLMs in an iterative process to overcome the limitation of LLMs in RTL code debugging, which is suitable for identifying and correcting both syntax and function errors, while effectively managing the uncertainties inherent in LLM operations. To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors. The experimental results demonstrate that the proposed debugging framework achieves fix rate of 93% for syntax errors and 78% for function errors, with up to 48x speedup in debugging processes when compared with experienced engineers. The Repo. of dataset and code: https://anonymous.4open.science/r/Verilog-Auto-Debug-6E7F/.
- [96] arXiv:2405.06841 [pdf, ps, html, other]
-
Title: Bridging the Gap: Protocol Towards Fair and Consistent Affect AnalysisComments: accepted at IEEE FG 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machine learning, has seen significant development. However, existing databases and methodologies lack uniformity, leading to biased evaluations. This work addresses these issues by analyzing six affective databases, annotating demographic attributes, and proposing a common protocol for database partitioning. Emphasis is placed on fairness in evaluations. Extensive experiments with baseline and state-of-the-art methods demonstrate the impact of these changes, revealing the inadequacy of prior assessments. The findings underscore the importance of considering demographic attributes in affect analysis research and provide a foundation for more equitable methodologies. Our annotations, code and pre-trained models are available at: this https URL
- [97] arXiv:2405.06842 [pdf, ps, html, other]
-
Title: BitVMX: A CPU for Universal Computation on BitcoinSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
BitVMX is a new design for a virtual CPU to optimistically execute arbitrary programs on Bitcoin based on a challenge response game introduced in BitVM. Similar to BitVM1 we create a general-purpose CPU to be verified in Bitcoin script. Our design supports common architectures, such as RISC-V or MIPS. Our main contribution to the state of the art is a design that uses hash chains of program traces, memory mapped registers, and a new challenge-response protocol. We present a new message linking protocol as a means to allow authenticated communication between the participants. This protocol emulates stateful smart contracts by sharing state between transactions. This provides a basis for our verification game which uses a graph of pre-signed transactions to support challenge-response interactions. In case of a dispute, the hash chain of program trace is used with selective pre-signed transactions to locate (via $n$-ary search) and then recover the precise nature of errors in the computation. Unlike BitVM1, our approach does not require the creation of Merkle trees for CPU instructions or memory words. Additionally, it does not rely on signature equivocations. These differences help avoid complexities associated with BitVM1 and make BitVMX a compelling alternative to BitVM2. Our approach is quite flexible, BitVMX can be instantiated to balance transaction cost vs round complexity, prover cost vs verifier cost, and precomputations vs round complexity.
- [98] arXiv:2405.06845 [pdf, ps, html, other]
-
Title: CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized CamerasComments: Accepted to the 18th IEEE International Conference on Automatic Face and Gesture RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
It is now possible to estimate 3D human pose from monocular images with off-the-shelf 3D pose estimators. However, many practical applications require fine-grained absolute pose information for which multi-view cues and camera calibration are necessary. Such multi-view recordings are laborious because they require manual calibration, and are expensive when using dedicated hardware. Our goal is full automation, which includes temporal synchronization, as well as intrinsic and extrinsic camera calibration. This is done by using persons in the scene as the calibration objects. Existing methods either address only synchronization or calibration, assume one of the former as input, or have significant limitations. A common limitation is that they only consider single persons, which eases correspondence finding. We attain this generality by partitioning the high-dimensional time and calibration space into a cascade of subspaces and introduce tailored algorithms to optimize each efficiently and robustly. The outcome is an easy-to-use, flexible, and robust motion capture toolbox that we release to enable scientific applications, which we demonstrate on diverse multi-view benchmarks. Project website: this https URL.
- [99] arXiv:2405.06846 [pdf, ps, html, other]
-
Title: Dominion: A New Frontier for AI ResearchSubjects: Artificial Intelligence (cs.AI)
In recent years, machine learning approaches have made dramatic advances, reaching superhuman performance in Go, Atari, and poker variants. These games, and others before them, have served not only as a testbed but have also helped to push the boundaries of AI research. Continuing this tradition, we examine the tabletop game Dominion and discuss the properties that make it well-suited to serve as a benchmark for the next generation of reinforcement learning (RL) algorithms. We also present the Dominion Online Dataset, a collection of over 2,000,000 games of Dominion played by experienced players on the Dominion Online webserver. Finally, we introduce an RL baseline bot that uses existing techniques to beat common heuristic-based bots, and shows competitive performance against the previously strongest bot, Provincial.
- [100] arXiv:2405.06848 [pdf, ps, html, other]
-
Title: ISR: Invertible Symbolic RegressionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
We introduce an Invertible Symbolic Regression (ISR) method. It is a machine learning technique that generates analytical relationships between inputs and outputs of a given dataset via invertible maps (or architectures). The proposed ISR method naturally combines the principles of Invertible Neural Networks (INNs) and Equation Learner (EQL), a neural network-based symbolic architecture for function learning. In particular, we transform the affine coupling blocks of INNs into a symbolic framework, resulting in an end-to-end differentiable symbolic invertible architecture that allows for efficient gradient-based learning. The proposed ISR framework also relies on sparsity promoting regularization, allowing the discovery of concise and interpretable invertible expressions. We show that ISR can serve as a (symbolic) normalizing flow for density estimation tasks. Furthermore, we highlight its practical applicability in solving inverse problems, including a benchmark inverse kinematics problem, and notably, a geoacoustic inversion problem in oceanography aimed at inferring posterior distributions of underlying seabed parameters from acoustic signals.
- [101] arXiv:2405.06849 [pdf, ps, html, other]
-
Title: GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNsComments: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. To solve this issue, we propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN as it limits the number of considered graph connections made within an image. Additionally, we propose a novel CNN-GNN architecture, GreedyViG, which uses DAGC. Extensive experiments show that GreedyViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification, object detection, instance segmentation, and semantic segmentation tasks. Our smallest model, GreedyViG-S, achieves 81.1% top-1 accuracy on ImageNet-1K, 2.9% higher than Vision GNN and 2.2% higher than Vision HyperGraph Neural Network (ViHGNN), with less GMACs and a similar number of parameters. Our largest model, GreedyViG-B obtains 83.9% top-1 accuracy, 0.2% higher than Vision GNN, with a 66.6% decrease in parameters and a 69% decrease in GMACs. GreedyViG-B also obtains the same accuracy as ViHGNN with a 67.3% decrease in parameters and a 71.3% decrease in GMACs. Our work shows that hybrid CNN-GNN architectures not only provide a new avenue for designing efficient models, but that they can also exceed the performance of current state-of-the-art models.
- [102] arXiv:2405.06855 [pdf, ps, html, other]
-
Title: Linear Explanations for Individual NeuronsComments: Published in ICML 2024Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
In recent years many methods have been developed to understand the internal workings of neural networks, often by describing the function of individual neurons in the model. However, these methods typically only focus on explaining the very highest activations of a neuron. In this paper we show this is not sufficient, and that the highest activation range is only responsible for a very small percentage of the neuron's causal effect. In addition, inputs causing lower activations are often very different and can't be reliably predicted by only looking at high activations. We propose that neurons should instead be understood as a linear combination of concepts, and develop an efficient method for producing these linear explanations. In addition, we show how to automatically evaluate description quality using simulation, i.e. predicting neuron activations on unseen inputs in vision setting.
- [103] arXiv:2405.06856 [pdf, ps, html, other]
-
Title: Aladdin: Joint Placement and Scaling for SLO-Aware LLM ServingSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The demand for large language model (LLM) inference is gradually dominating the artificial intelligence workloads. Therefore, there is an urgent need for cost-efficient inference serving. Existing work focuses on single-worker optimization and lacks consideration of cluster-level management for both inference queries and computing resources. However, placing requests and managing resources without considering the query features easily causes SLO violations or resource underutilization. Providers are forced to allocate extra computing resources to guarantee user experience, leading to additional serving costs. In this paper we introduce Aladdin, a scheduler that co-adaptively places queries and scales computing resources with SLO awareness. For a stream of inference queries, Aladdin first predicts minimal computing resources and the corresponding serving workers' configuration required to fulfill the SLOs for all queries. Then, it places the queries to each serving worker according to the prefill and decode latency models of batched LLM inference to maximize each worker's utilization. Results show that Aladdin reduces the serving cost of a single model by up to 71% for the same SLO level compared with the baselines, which can be millions of dollars per year.
- [104] arXiv:2405.06859 [pdf, ps, other]
-
Title: Reimplementation of Learning to Reweight Examples for Robust Deep LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Deep neural networks (DNNs) have been used to create models for many complex analysis problems like image recognition and medical diagnosis. DNNs are a popular tool within machine learning due to their ability to model complex patterns and distributions. However, the performance of these networks is highly dependent on the quality of the data used to train the models. Two characteristics of these sets, noisy labels and training set biases, are known to frequently cause poor generalization performance as a result of overfitting to the training set. This paper aims to solve this problem using the approach proposed by Ren et al. (2018) using meta-training and online weight approximation. We will first implement a toy-problem to crudely verify the claims made by the authors of Ren et al. (2018) and then venture into using the approach to solve a real world problem of Skin-cancer detection using an imbalanced image dataset.
- [105] arXiv:2405.06865 [pdf, ps, html, other]
-
Title: Disrupting Style Mimicry Attacks on Video ImagerySubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Generative AI models are often used to perform mimicry attacks, where a pretrained model is fine-tuned on a small sample of images to learn to mimic a specific artist of interest. While researchers have introduced multiple anti-mimicry protection tools (Mist, Glaze, Anti-Dreambooth), recent evidence points to a growing trend of mimicry models using videos as sources of training data. This paper presents our experiences exploring techniques to disrupt style mimicry on video imagery. We first validate that mimicry attacks can succeed by training on individual frames extracted from videos. We show that while anti-mimicry tools can offer protection when applied to individual frames, this approach is vulnerable to an adaptive countermeasure that removes protection by exploiting randomness in optimization results of consecutive (nearly-identical) frames. We develop a new, tool-agnostic framework that segments videos into short scenes based on frame-level similarity, and use a per-scene optimization baseline to remove inter-frame randomization while reducing computational cost. We show via both image level metrics and an end-to-end user study that the resulting protection restores protection against mimicry (including the countermeasure). Finally, we develop another adaptive countermeasure and find that it falls short against our framework.
- [106] arXiv:2405.06869 [pdf, ps, html, other]
-
Title: Sharpness-Aware Minimization for Evolutionary Feature Construction in RegressionComments: Submitted to IEEE Transactions on Pattern Analysis and Machine IntelligenceSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In recent years, genetic programming (GP)-based evolutionary feature construction has achieved significant success. However, a primary challenge with evolutionary feature construction is its tendency to overfit the training data, resulting in poor generalization on unseen data. In this research, we draw inspiration from PAC-Bayesian theory and propose using sharpness-aware minimization in function space to discover symbolic features that exhibit robust performance within a smooth loss landscape in the semantic space. By optimizing sharpness in conjunction with cross-validation loss, as well as designing a sharpness reduction layer, the proposed method effectively mitigates the overfitting problem of GP, especially when dealing with a limited number of instances or in the presence of label noise. Experimental results on 58 real-world regression datasets show that our approach outperforms standard GP as well as six state-of-the-art complexity measurement methods for GP in controlling overfitting. Furthermore, the ensemble version of GP with sharpness-aware minimization demonstrates superior performance compared to nine fine-tuned machine learning and symbolic regression algorithms, including XGBoost and LightGBM.
- [107] arXiv:2405.06870 [pdf, ps, html, other]
-
Title: Noise-Tolerant Codebooks for Semi-Quantitative Group Testing: Application to Spatial GenomicsComments: To appear in ISIT 2024 ProceedingsSubjects: Information Theory (cs.IT)
Motivated by applications in spatial genomics, we revisit group testing (Dorfman~1943) and propose the class of $\lambda$-{\sf ADD}-codes, studying such codes with certain distance $d$ and codelength $n$. When $d$ is constant, we provide explicit code constructions with rates close to $1/2$. When $d$ is proportional to $n$, we provide a GV-type lower bound whose rates are efficiently computable. Upper bounds for such codes are also studied.
- [108] arXiv:2405.06871 [pdf, ps, html, other]
-
Title: Statistical Error of Numerical Integrators for Underdamped Langevin Dynamics with Deterministic And Stochastic GradientsSubjects: Numerical Analysis (math.NA); Probability (math.PR)
We propose a novel discrete Poisson equation approach to estimate the statistical error of a broad class of numerical integrators for the underdamped Langevin dynamics. The statistical error refers to the mean square error of the estimator to the exact ensemble average with a finite number of iterations. With the proposed error analysis framework, we show that when the potential function $U(x)$ is strongly convex in $\mathbb R^d$ and the numerical integrator has strong order $p$, the statistical error is $O(h^{2p}+\frac1{Nh})$, where $h$ is the time step and $N$ is the number of iterations. Besides, this approach can be adopted to analyze integrators with stochastic gradients, and quantitative estimates can be derived as well. Our approach only requires the geometric ergodicity of the continuous-time underdamped Langevin dynamics, and relaxes the constraint on the time step.
- [109] arXiv:2405.06872 [pdf, ps, html, other]
-
Title: eCAR: edge-assisted Collaborative Augmented Reality FrameworkSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We propose a novel edge-assisted multi-user collaborative augmented reality framework in a large indoor environment. In Collaborative Augmented Reality, data communication that synchronizes virtual objects has large network traffic and high network latency. Due to drift, CAR applications without continuous data communication for coordinate system alignment have virtual object inconsistency. In addition, synchronization messages for online virtual object updates have high latency as the number of collaborative devices increases. To solve this problem, we implement the CAR framework, called eCAR, which utilizes edge computing to continuously match the device's coordinate system with less network traffic. Furthermore, we extend the co-visibility graph of the edge server to maintain virtual object spatial-temporal consistency in neighboring devices by synchronizing a local graph. We evaluate the system quantitatively and qualitatively in the public dataset and a physical indoor environment. eCAR communicates data for coordinate system alignment between the edge server and devices with less network traffic and latency. In addition, collaborative augmented reality synchronization algorithms quickly and accurately host and resolve virtual objects. The proposed system continuously aligns coordinate systems to multiple devices in a large indoor environment and shares augmented reality content. Through our system, users interact with virtual objects and share augmented reality experiences with neighboring users.
- [110] arXiv:2405.06874 [pdf, ps, html, other]
-
Title: An Interior Penalty Discontinuous Galerkin Method for an Interface Model of Flow in Fractured Porous MediaSubjects: Numerical Analysis (math.NA)
Discrete fracture models with reduced-dimensional treatment of conductive and blocking fractures are widely used to simulate fluid flow in fractured porous media. Among these, numerical methods based on interface models are intensively studied, where the fractures are treated as co-dimension one manifolds in a bulk matrix with low-dimensional governing equations. In this paper, we propose a simple yet effective treatment for modeling the fractures on fitted grids in the interior penalty discontinuous Galerkin (IPDG) methods without introducing any additional degrees of freedom or equations on the interfaces. We conduct stability and {\em hp}-analysis for the proposed IPDG method, deriving optimal a priori error bounds concerning mesh size ($h$) and sub-optimal bounds for polynomial degree ($k$) in both the energy norm and the $L^2$ norm. Numerical experiments involving published benchmarks validate our theoretical analysis and demonstrate the method's robust performance. Furthermore, we extend our method to two-phase flows and use numerical tests to confirm the algorithm's validity.
- [111] arXiv:2405.06875 [pdf, ps, other]
-
Title: LogicAL: Towards logical anomaly synthesis for unsupervised anomaly localizationComments: Accepted to Visual Anomaly and Novelty Detection (VAND) 2.0 Workshop at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Anomaly localization is a practical technology for improving industrial production line efficiency. Due to anomalies are manifold and hard to be collected, existing unsupervised researches are usually equipped with anomaly synthesis methods. However, most of them are biased towards structural defects synthesis while ignoring the underlying logical constraints. To fill the gap and boost anomaly localization performance, we propose an edge manipulation based anomaly synthesis framework, named LogicAL, that produces photo-realistic both logical and structural anomalies. We introduce a logical anomaly generation strategy that is adept at breaking logical constraints and a structural anomaly generation strategy that complements to the structural defects synthesis. We further improve the anomaly localization performance by introducing edge reconstruction into the network structure. Extensive experiments on the challenge MVTecLOCO, MVTecAD, VisA and MADsim datasets verify the advantage of proposed LogicAL on both logical and structural anomaly localization.
- [112] arXiv:2405.06884 [pdf, ps, html, other]
-
Title: Efficient PAC Learnability of Dynamical Systems Over Multilayer NetworksZirou Qiu, Abhijin Adiga, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil VullikantiComments: Accepted at ICML 2024Subjects: Machine Learning (cs.LG)
Networked dynamical systems are widely used as formal models of real-world cascading phenomena, such as the spread of diseases and information. Prior research has addressed the problem of learning the behavior of an unknown dynamical system when the underlying network has a single layer. In this work, we study the learnability of dynamical systems over multilayer networks, which are more realistic and challenging. First, we present an efficient PAC learning algorithm with provable guarantees to show that the learner only requires a small number of training examples to infer an unknown system. We further provide a tight analysis of the Natarajan dimension which measures the model complexity. Asymptotically, our bound on the Nararajan dimension is tight for almost all multilayer graphs. The techniques and insights from our work provide the theoretical foundations for future investigations of learning problems for multilayer dynamical systems.
- [113] arXiv:2405.06886 [pdf, ps, html, other]
-
Title: Event GDR: Event-Centric Generative Document RetrievalComments: Accepted to WWW 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Generative document retrieval, an emerging paradigm in information retrieval, learns to build connections between documents and identifiers within a single model, garnering significant attention. However, there are still two challenges: (1) neglecting inner-content correlation during document representation; (2) lacking explicit semantic structure during identifier construction. Nonetheless, events have enriched relations and well-defined taxonomy, which could facilitate addressing the above two challenges. Inspired by this, we propose Event GDR, an event-centric generative document retrieval model, integrating event knowledge into this task. Specifically, we utilize an exchange-then-reflection method based on multi-agents for event knowledge extraction. For document representation, we employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation. For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure. Our method achieves significant improvement over the baselines on two datasets, and also hopes to provide insights for future research.
- [114] arXiv:2405.06887 [pdf, ps, html, other]
-
Title: FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality AssessmentComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Existing action quality assessment (AQA) methods mainly learn deep representations at the video level for scoring diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events. We argue that a fine-grained understanding of actions requires the model to perceive and parse actions in both time and space, which is also the key to the credibility and interpretability of the AQA technique. Based on this insight, we propose a new fine-grained spatial-temporal action parser named \textbf{FineParser}. It learns human-centric foreground action representations by focusing on target action regions within each frame and exploiting their fine-grained alignments in time and space to minimize the impact of invalid backgrounds during the assessment. In addition, we construct fine-grained annotations of human-centric foreground action masks for the FineDiving dataset, called \textbf{FineDiving-HM}. With refined annotations on diverse target action procedures, FineDiving-HM can promote the development of real-world AQA systems. Through extensive experiments, we demonstrate the effectiveness of FineParser, which outperforms state-of-the-art methods while supporting more tasks of fine-grained action understanding. Data and code are available at \url{this https URL}.
- [115] arXiv:2405.06890 [pdf, ps, html, other]
-
Title: TacoERE: Cluster-aware Compression for Event Relation ExtractionComments: Accepted to LREC-COLING 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Event relation extraction (ERE) is a critical and fundamental challenge for natural language processing. Existing work mainly focuses on directly modeling the entire document, which cannot effectively handle long-range dependencies and information redundancy. To address these issues, we propose a cluster-aware compression method for improving event relation extraction (TacoERE), which explores a compression-then-extraction paradigm. Specifically, we first introduce document clustering for modeling event dependencies. It splits the document into intra- and inter-clusters, where intra-clusters aim to enhance the relations within the same cluster, while inter-clusters attempt to model the related events at arbitrary distances. Secondly, we utilize cluster summarization to simplify and highlight important text content of clusters for mitigating information redundancy and event distance. We have conducted extensive experiments on both pre-trained language models, such as RoBERTa, and large language models, such as ChatGPT and GPT-4, on three ERE datasets, i.e., MAVEN-ERE, EventStoryLine and HiEve. Experimental results demonstrate that TacoERE is an effective method for ERE.
- [116] arXiv:2405.06893 [pdf, ps, html, other]
-
Title: ADLDA: A Method to Reduce the Harm of Data Distribution Shift in Data AugmentationComments: 8 page 4 figSubjects: Computer Vision and Pattern Recognition (cs.CV)
This study introduces a novel data augmentation technique, ADLDA, aimed at mitigating the negative impact of data distribution shifts caused by the data augmentation process in computer vision task. ADLDA partitions augmented data into distinct subdomains and incorporates domain labels, combined with domain adaptation techniques, to optimize data representation in the model's feature space. Experimental results demonstrate that ADLDA significantly enhances model performance across multiple datasets, particularly in neural network architectures with complex feature extraction layers. Furthermore, ADLDA improves the model's ability to locate and recognize key features, showcasing potential in object recognition and image segmentation tasks. This paper's contribution provides an effective data augmentation regularization method for the field of computer vision aiding in the enhancement of robustness and accuracy in deep learning models.
- [117] arXiv:2405.06895 [pdf, ps, other]
-
Title: Unveiling the Era of Spatial ComputingSubjects: Human-Computer Interaction (cs.HC)
The evolution of User Interfaces marks a significant transition from traditional command-line interfaces to more intuitive graphical and touch-based interfaces, largely driven by the emergence of personal computing devices. The advent of spatial computing and Extended Reality technologies further pushes the boundaries, promising a fusion of physical and digital realms through interactive environments. This paper delves into the progression from All Realities technologies encompassing Augmented Reality, Virtual Reality, and Mediated Reality to spatial computing, highlighting their conceptual differences and applications. We explore enabling technologies such as Artificial Intelligence, the Internet of Things, 5G, cloud and edge computing, and blockchain that underpin the development of spatial computing. We further scrutinize the initial forays into commercial spatial computing devices, with a focus on Apple's Vision Pro, evaluating its technological advancements alongside the challenges it faces. Through this examination, we aim to provide insights into the potential of spatial computing to revolutionize our interaction with digital information and the physical world.
- [118] arXiv:2405.06899 [pdf, ps, html, other]
-
Title: Towards Metric DBSCAN: Exact, Approximate, and Streaming AlgorithmsSubjects: Data Structures and Algorithms (cs.DS)
DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space ({\em e.g.,} clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimensional Euclidean space. In this paper, we study the DBSCAN problem under the assumption that the inliers (the core points and border points) have a low intrinsic dimension (which is a realistic assumption for many high-dimensional applications), where the outliers can locate anywhere in the space without any assumption. First, we propose a $k$-center clustering based algorithm that can reduce the time-consuming labeling and merging tasks of DBSCAN to be linear. Further, we propose a linear time approximate DBSCAN algorithm, where the key idea is building a novel small-size summary for the core points. Also, our algorithm can be efficiently implemented for streaming data and the required memory is independent of the input size. Finally, we conduct our experiments and compare our algorithms with several popular DBSCAN algorithms. The experimental results suggest that our proposed approach can significantly reduce the computational complexity in practice.
- [119] arXiv:2405.06902 [pdf, ps, other]
-
Title: Causal Inference from Slowly Varying Nonstationary ProcessesComments: Accepted to the IEEE Transactions on Signal and Information Processing over Networks. arXiv admin note: substantial text overlap with arXiv:2012.13025Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In this work, we propose a new class of restricted SCM, via a time-varying filter and stationary noise, and exploit the asymmetry from nonstationarity for causal identification in both bivariate and network settings. We propose efficient procedures by leveraging powerful estimates of the bivariate evolutionary spectra for slowly varying processes. Various synthetic and real datasets that involve high-order and non-smooth filters are evaluated to demonstrate the effectiveness of our proposed methodology.
- [120] arXiv:2405.06903 [pdf, ps, html, other]
-
Title: UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual CorrespondenceComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Garment manipulation (e.g., unfolding, folding and hanging clothes) is essential for future robots to accomplish home-assistant tasks, while highly challenging due to the diversity of garment configurations, geometries and deformations. Although able to manipulate similar shaped garments in a certain task, previous works mostly have to design different policies for different tasks, could not generalize to garments with diverse geometries, and often rely heavily on human-annotated data. In this paper, we leverage the property that, garments in a certain category have similar structures, and then learn the topological dense (point-level) visual correspondence among garments in the category level with different deformations in the self-supervised manner. The topological correspondence can be easily adapted to the functional correspondence to guide the manipulation policies for various downstream tasks, within only one or few-shot demonstrations. Experiments over garments in 3 different categories on 3 representative tasks in diverse scenarios, using one or two arms, taking one or more steps, inputting flat or messy garments, demonstrate the effectiveness of our proposed method. Project page: this https URL.
- [121] arXiv:2405.06904 [pdf, ps, other]
-
Title: Generation of Granular-Balls for Clustering Based on the Principle of Justifiable GranularitySubjects: Machine Learning (cs.LG)
Efficient and robust data clustering remains a challenging task in the field of data analysis. Recent efforts have explored the integration of granular-ball (GB) computing with clustering algorithms to address this challenge, yielding promising results. However, existing methods for generating GBs often rely on single indicators to measure GB quality and employ threshold-based or greedy strategies, potentially leading to GBs that do not accurately capture the underlying data distribution. To address these limitations, this article introduces a novel GB generation method. The originality of this method lies in leveraging the principle of justifiable granularity to measure the quality of a GB for clustering tasks. To be precise, we define the coverage and specificity of a GB and introduce a comprehensive measure for assessing GB quality. Utilizing this quality measure, the method incorporates a binary tree pruning-based strategy and an anomaly detection method to determine the best combination of sub-GBs for each GB and identify abnormal GBs, respectively. Compared to previous GB generation methods, the new method maximizes the overall quality of generated GBs while ensuring alignment with the data distribution, thereby enhancing the rationality of the generated GBs. Experimental results obtained from both synthetic and publicly available datasets underscore the effectiveness of the proposed GB generation method, showcasing improvements in clustering accuracy and normalized mutual information.
- [122] arXiv:2405.06906 [pdf, ps, html, other]
-
Title: Finding structure in logographic writing with library learningComments: Accepted at CogSci 2024 (Talk)Subjects: Computation and Language (cs.CL)
One hallmark of human language is its combinatoriality -- reusing a relatively small inventory of building blocks to create a far larger inventory of increasingly complex structures. In this paper, we explore the idea that combinatoriality in language reflects a human inductive bias toward representational efficiency in symbol systems. We develop a computational framework for discovering structure in a writing system. Built on top of state-of-the-art library learning and program synthesis techniques, our computational framework discovers known linguistic structures in the Chinese writing system and reveals how the system evolves towards simplification under pressures for representational efficiency. We demonstrate how a library learning approach, utilizing learned abstractions and compression, may help reveal the fundamental computational principles that underlie the creation of combinatorial structures in human cognition, and offer broader insights into the evolution of efficient communication systems.
- [123] arXiv:2405.06907 [pdf, ps, html, other]
-
Title: CoRE: LLM as Interpreter for Natural Language Programming, Pseudo-Code Programming, and Flow Programming of AI AgentsComments: 12 pages, 6 figures, comments and suggestions are welcomeSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
Since their inception, programming languages have trended towards greater readability and lower barriers for programmers. Following this trend, natural language can be a promising type of programming language that provides great flexibility and usability and helps towards the democracy of programming. However, the inherent vagueness, ambiguity, and verbosity of natural language pose significant challenges in developing an interpreter that can accurately understand the programming logic and execute instructions written in natural language. Fortunately, recent advancements in Large Language Models (LLMs) have demonstrated remarkable proficiency in interpreting complex natural language. Inspired by this, we develop a novel system for Code Representation and Execution (CoRE), which employs LLM as interpreter to interpret and execute natural language instructions. The proposed system unifies natural language programming, pseudo-code programming, and flow programming under the same representation for constructing language agents, while LLM serves as the interpreter to interpret and execute the agent programs. In this paper, we begin with defining the programming syntax that structures natural language instructions logically. During the execution, we incorporate external memory to minimize redundancy. Furthermore, we equip the designed interpreter with the capability to invoke external tools, compensating for the limitations of LLM in specialized domains or when accessing real-time information. This work is open-source at this https URL.
- [124] arXiv:2405.06908 [pdf, ps, html, other]
-
Title: To Ask or Not To Ask: Human-in-the-loop Contextual Bandits with Applications in Robot-Assisted FeedingRohan Banerjee, Rajat Kumar Jenamani, Sidharth Vasudev, Amal Nanavati, Sarah Dean, Tapomayukh BhattacharjeeComments: Under submission to IROS 2024. The second and third authors contributed equally. The last two authors advised equallySubjects: Robotics (cs.RO)
Robot-assisted bite acquisition involves picking up food items that vary in their shape, compliance, size, and texture. A fully autonomous strategy for bite acquisition is unlikely to efficiently generalize to this wide variety of food items. We propose to leverage the presence of the care recipient to provide feedback when the system encounters novel food items. However, repeatedly asking for help imposes cognitive workload on the user. In this work, we formulate human-in-the-loop bite acquisition within a contextual bandit framework and propose a novel method, LinUCB-QG, that selectively asks for help. This method leverages a predictive model of cognitive workload in response to different types and timings of queries, learned using data from 89 participants collected in an online user study. We demonstrate that this method enhances the balance between task performance and cognitive workload compared to autonomous and querying baselines, through experiments in a food dataset-based simulator and a user study with 18 participants without mobility limitations.
- [125] arXiv:2405.06909 [pdf, ps, other]
-
Title: Fairness in Reinforcement Learning: A SurveyComments: 10 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period of time. To ensure the responsible development and deployment of these systems, we must better understand fairness in RL. In this paper, we survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in RL. We start by reviewing where fairness considerations can arise in RL, then discuss the various definitions of fairness in RL that have been put forth thus far. We continue to highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems before showcasing the distinct application domains that fair RL has been investigated in. Finally, we critically examine gaps in the literature, such as understanding fairness in the context of RLHF, that still need to be addressed in future work to truly operationalize fair RL in real-world systems.
- [126] arXiv:2405.06910 [pdf, ps, html, other]
-
Title: Generative flow induced neural architecture search: Towards discovering optimal architecture in wavelet neural operatorSubjects: Machine Learning (cs.LG)
We propose a generative flow-induced neural architecture search algorithm. The proposed approach devices simple feed-forward neural networks to learn stochastic policies to generate sequences of architecture hyperparameters such that the generated states are in proportion with the reward from the terminal state. We demonstrate the efficacy of the proposed search algorithm on the wavelet neural operator (WNO), where we learn a policy to generate a sequence of hyperparameters like wavelet basis and activation operators for wavelet integral blocks. While the trajectory of the generated wavelet basis and activation sequence is cast as flow, the policy is learned by minimizing the flow violation between each state in the trajectory and maximizing the reward from the terminal state. In the terminal state, we train WNO simultaneously to guide the search. We propose to use the exponent of the negative of the WNO loss on the validation dataset as the reward function. While the grid search-based neural architecture generation algorithms foresee every combination, the proposed framework generates the most probable sequence based on the positive reward from the terminal state, thereby reducing exploration time. Compared to reinforcement learning schemes, where complete episodic training is required to get the reward, the proposed algorithm generates the hyperparameter trajectory sequentially. Through four fluid mechanics-oriented problems, we illustrate that the learned policies can sample the best-performing architecture of the neural operator, thereby improving the performance of the vanilla wavelet neural operator.
- [127] arXiv:2405.06911 [pdf, ps, other]
-
Title: Replication Study and Benchmarking of Real-Time Object Detection ModelsComments: Authors are presented in alphabetical order, each having equal contribution to the work. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Computer Vision and Pattern Recognition (cs.CV)
This work examines the reproducibility and benchmarking of state-of-the-art real-time object detection models. As object detection models are often used in real-world contexts, such as robotics, where inference time is paramount, simply measuring models' accuracy is not enough to compare them. We thus compare a large variety of object detection models' accuracy and inference speed on multiple graphics cards. In addition to this large benchmarking attempt, we also reproduce the following models from scratch using PyTorch on the MS COCO 2017 dataset: DETR, RTMDet, ViTDet and YOLOv7. More importantly, we propose a unified training and evaluation pipeline, based on MMDetection's features, to better compare models. Our implementation of DETR and ViTDet could not achieve accuracy or speed performances comparable to what is declared in the original papers. On the other hand, reproduced RTMDet and YOLOv7 could match such performances. Studied papers are also found to be generally lacking for reproducibility purposes. As for MMDetection pretrained models, speed performances are severely reduced with limited computing resources (larger, more accurate models even more so). Moreover, results exhibit a strong trade-off between accuracy and speed, prevailed by anchor-free models - notably RTMDet or YOLOx models. The code used is this paper and all the experiments is available in the repository at this https URL.
- [128] arXiv:2405.06914 [pdf, ps, html, other]
-
Title: Non-confusing Generation of Customized Concepts in Diffusion ModelsWang Lin, Jingyuan Chen, Jiaxin Shi, Yichen Zhu, Chen Liang, Junzhong Miao, Tao Jin, Zhou Zhao, Fei Wu, Shuicheng Yan, Hanwang ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs). It becomes even more pronounced in the generation of customized concepts, due to the scarcity of user-provided concept visual examples. By revisiting the two major stages leading to the success of TGDMs -- 1) contrastive image-language pre-training (CLIP) for text encoder that encodes visual semantics, and 2) training TGDM that decodes the textual embeddings into pixels -- we point that existing customized generation methods only focus on fine-tuning the second stage while overlooking the first one. To this end, we propose a simple yet effective solution called CLIF: contrastive image-language fine-tuning. Specifically, given a few samples of customized concepts, we obtain non-confusing textual embeddings of a concept by fine-tuning CLIP via contrasting a concept and the over-segmented visual regions of other concepts. Experimental results demonstrate the effectiveness of CLIF in preventing the confusion of multi-customized concept generation.
- [129] arXiv:2405.06915 [pdf, ps, other]
-
Title: Automating CreativityComments: 46 pages, 2 tables, 4 figuresSubjects: Artificial Intelligence (cs.AI)
Generative AI (GenAI) has spurred the expectation of being creative, due to its ability to generate content, yet so far, its creativity has somewhat disappointed, because it is trained using existing data following human intentions to generate outputs. The purpose of this paper is to explore what is required to evolve AI from generative to creative. Based on a reinforcement learning approach and building upon various research streams of computational creativity, we develop a triple prompt-response-reward engineering framework to develop the creative capability of GenAI. This framework consists of three components: 1) a prompt model for expected creativity by developing discriminative prompts that are objectively, individually, or socially novel, 2) a response model for observed creativity by generating surprising outputs that are incrementally, disruptively, or radically innovative, and 3) a reward model for improving creativity over time by incorporating feedback from the AI, the creator/manager, and/or the customers. This framework enables the application of GenAI for various levels of creativity strategically.
- [130] arXiv:2405.06916 [pdf, ps, html, other]
-
Title: High-order Neighborhoods Know More: HyperGraph Learning Meets Source-free Unsupervised Domain AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Source-free Unsupervised Domain Adaptation (SFDA) aims to classify target samples by only accessing a pre-trained source model and unlabelled target samples. Since no source data is available, transferring the knowledge from the source domain to the target domain is challenging. Existing methods normally exploit the pair-wise relation among target samples and attempt to discover their correlations by clustering these samples based on semantic features. The drawback of these methods includes: 1) the pair-wise relation is limited to exposing the underlying correlations of two more samples, hindering the exploration of the structural information embedded in the target domain; 2) the clustering process only relies on the semantic feature, while overlooking the critical effect of domain shift, i.e., the distribution differences between the source and target domains. To address these issues, we propose a new SFDA method that exploits the high-order neighborhood relation and explicitly takes the domain shift effect into account. Specifically, we formulate the SFDA as a Hypergraph learning problem and construct hyperedges to explore the local group and context information among multiple samples. Moreover, we integrate a self-loop strategy into the constructed hypergraph to elegantly introduce the domain uncertainty of each sample. By clustering these samples based on hyperedges, both the semantic feature and domain shift effects are considered. We then describe an adaptive relation-based objective to tune the model with soft attention levels for all samples. Extensive experiments are conducted on Office-31, Office-Home, VisDA, and PointDA-10 datasets. The results demonstrate the superiority of our method over state-of-the-art counterparts.
- [131] arXiv:2405.06917 [pdf, ps, other]
-
Title: Design Requirements for Human-Centered Graph Neural Network ExplanationsSubjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Graph neural networks (GNNs) are powerful graph-based machine-learning models that are popular in various domains, e.g., social media, transportation, and drug discovery. However, owing to complex data representations, GNNs do not easily allow for human-intelligible explanations of their predictions, which can decrease trust in them as well as deter any collaboration opportunities between the AI expert and non-technical, domain expert. Here, we first discuss the two papers that aim to provide GNN explanations to domain experts in an accessible manner and then establish a set of design requirements for human-centered GNN explanations. Finally, we offer two example prototypes to demonstrate some of those proposed requirements.
- [132] arXiv:2405.06918 [pdf, ps, html, other]
-
Title: Super-Resolving Blurry Images with EventsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution. To address this challenge, this paper introduces an Event-based Blurry Super Resolution Network (EBSR-Net), which leverages the high temporal resolution of events to mitigate motion blur and improve high-resolution image prediction. Specifically, we propose a multi-scale center-surround event representation to fully capture motion and texture information inherent in events. Additionally, we design a symmetric cross-modal attention module to fully exploit the complementarity between blurry images and events. Furthermore, we introduce an intermodal residual group composed of several residual dense Swin Transformer blocks, each incorporating multiple Swin Transformer layers and a residual connection, to extract global context and facilitate inter-block feature aggregation. Extensive experiments show that our method compares favorably against state-of-the-art approaches and achieves remarkable performance.
- [133] arXiv:2405.06919 [pdf, ps, other]
-
Title: Automating Thematic Analysis: How LLMs Analyse Controversial TopicsAwais Hameed Khan, Hiruni Kegalle, Rhea D'Silva, Ned Watt, Daniel Whelan-Shamy, Lida Ghahremanlou, Liam MageeComments: 18 pages, 6 figuresSubjects: Computers and Society (cs.CY); Computation and Language (cs.CL)
Large Language Models (LLMs) are promising analytical tools. They can augment human epistemic, cognitive and reasoning abilities, and support 'sensemaking', making sense of a complex environment or subject by analysing large volumes of data with a sensitivity to context and nuance absent in earlier text processing systems. This paper presents a pilot experiment that explores how LLMs can support thematic analysis of controversial topics. We compare how human researchers and two LLMs GPT-4 and Llama 2 categorise excerpts from media coverage of the controversial Australian Robodebt scandal. Our findings highlight intriguing overlaps and variances in thematic categorisation between human and machine agents, and suggest where LLMs can be effective in supporting forms of discourse and thematic analysis. We argue LLMs should be used to augment, and not replace human interpretation, and we add further methodological insights and reflections to existing research on the application of automation to qualitative research methods. We also introduce a novel card-based design toolkit, for both researchers and practitioners to further interrogate LLMs as analytical tools.
- [134] arXiv:2405.06922 [pdf, ps, html, other]
-
Title: EmoMix-3L: A Code-Mixed Dataset for Bangla-English-Hindi Emotion DetectionSubjects: Computation and Language (cs.CL)
Code-mixing is a well-studied linguistic phenomenon that occurs when two or more languages are mixed in text or speech. Several studies have been conducted on building datasets and performing downstream NLP tasks on code-mixed data. Although it is not uncommon to observe code-mixing of three or more languages, most available datasets in this domain contain code-mixed data from only two languages. In this paper, we introduce EmoMix-3L, a novel multi-label emotion detection dataset containing code-mixed data from three different languages. We experiment with several models on EmoMix-3L and we report that MuRIL outperforms other models on this dataset.
- [135] arXiv:2405.06925 [pdf, ps, html, other]
-
Title: Semi-supervised Anomaly Detection via Adaptive Reinforcement Learning-Enabled Method with Causal InferenceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Semi-supervised anomaly detection for guaranteeing the reliability of intelligent systems has received increasing attention. However, existing methods rely too much on data correlation and neglect causality, which can be misleading due to confounding factors and affect system reliability. Additionally, the current reinforcement learning anomaly detection methods can effectively identify known and unknown anomalies in environments with limited labeled samples. Despite its effectiveness, these methods still face several challenges, such as under-utilization of priori knowledge, lack of model flexibility, and insufficient reward feedback when interacting with the environment. To address the above problems, this paper innovatively constructs a counterfactual causal reinforcement learning model, termed Triple-Assisted Causal Reinforcement Learning Anomaly Detector (Tri-CRLAD). The model utilizes the causal inference mechanism to radically improve the performance of semi-supervised models and enhance the model's ability to uncover anomaly data in the face of unknown or rare data. In addition, Tri-CRLAD features a triple decision support mechanism, namely, a sampling strategy based on historical similarity, an adaptive threshold smoothing adjustment strategy, and an adaptive decision reward mechanism. These mechanisms further enhance the flexibility and generalization ability of the model, enabling it to effectively respond to various complex and dynamically changing environments. Finally, Tri-CRLAD matches or exceeds the performance of 9 baseline methods across 7 diverse intelligent system datasets, including satellite systems, medical systems, and health systems. Moreover, anomaly detection stability was significantly improved by up to 23\% with an extremely small number of known anomaly samples. Our code is available at this https URL
- [136] arXiv:2405.06926 [pdf, ps, html, other]
-
Title: TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable PromptComments: Accepted for publication at IJCAI 2024; 13 pages; 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
The recent introduction of prompt tuning based on pre-trained vision-language models has dramatically improved the performance of multi-label image classification. However, some existing strategies that have been explored still have drawbacks, i.e., either exploiting massive labeled visual data at a high cost or using text data only for text prompt tuning and thus failing to learn the diversity of visual knowledge. Hence, the application scenarios of these methods are limited. In this paper, we propose a pseudo-visual prompt~(PVP) module for implicit visual prompt tuning to address this problem. Specifically, we first learn the pseudo-visual prompt for each category, mining diverse visual knowledge by the well-aligned space of pre-trained vision-language models. Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities. Experimental results on VOC2007, MS-COCO, and NUSWIDE datasets demonstrate that our method can surpass state-of-the-art~(SOTA) methods across various settings for multi-label image classification tasks. The code is available at this https URL.
- [137] arXiv:2405.06927 [pdf, ps, html, other]
-
Title: Multimodal Pretraining and Generation for Recommendation: A TutorialComments: Published in WWW 2024 Tutorial. Find the tutorial materials at this https URLSubjects: Information Retrieval (cs.IR)
Personalized recommendation stands as a ubiquitous channel for users to explore information or items aligned with their interests. Nevertheless, prevailing recommendation models predominantly rely on unique IDs and categorical features for user-item matching. While this ID-centric approach has witnessed considerable success, it falls short in comprehensively grasping the essence of raw item contents across diverse modalities, such as text, image, audio, and video. This underutilization of multimodal data poses a limitation to recommender systems, particularly in the realm of multimedia services like news, music, and short-video platforms. The recent surge in pretraining and generation techniques presents both opportunities and challenges in the development of multimodal recommender systems. This tutorial seeks to provide a thorough exploration of the latest advancements and future trajectories in multimodal pretraining and generation techniques within the realm of recommender systems. The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications and open challenges in the field of recommendation. Our target audience encompasses scholars, practitioners, and other parties interested in this domain. By providing a succinct overview of the field, we aspire to facilitate a swift understanding of multimodal recommendation and foster meaningful discussions on the future development of this evolving landscape.
- [138] arXiv:2405.06928 [pdf, ps, other]
-
Title: Systematic Review of Extended Reality for Smart Built Environments Lighting Design SimulationsJournal-ref: IEEE Access 2024Subjects: Human-Computer Interaction (cs.HC)
This systematic literature review paper explores the use of extended reality {(XR)} technology for smart built environments and particularly for smart lighting systems design. Smart lighting is a novel concept that has emerged over a decade now and is being used and tested in commercial and industrial built environments. We used PRISMA methodology to review 270 research papers published from 1968 to 2023. Following a discussion of historical advances and key modeling techniques, a description of lighting simulation in the context of extended reality and smart built environment is given, followed by a discussion of the current trends and challenges.
- [139] arXiv:2405.06929 [pdf, ps, html, other]
-
Title: PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action RecognitionComments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recognizing human actions from point cloud sequence has attracted tremendous attention from both academia and industry due to its wide applications. However, most previous studies on point cloud action recognition typically require complex networks to extract intra-frame spatial features and inter-frame temporal features, resulting in an excessive number of redundant computations. This leads to high latency, rendering them impractical for real-world applications. To address this problem, we propose a Plane-Fit Redundancy Encoding point cloud sequence network named PRENet. The primary concept of our approach involves the utilization of plane fitting to mitigate spatial redundancy within the sequence, concurrently encoding the temporal redundancy of the entire sequence to minimize redundant computations. Specifically, our network comprises two principal modules: a Plane-Fit Embedding module and a Spatio-Temporal Consistency Encoding module. The Plane-Fit Embedding module capitalizes on the observation that successive point cloud frames exhibit unique geometric features in physical space, allowing for the reuse of spatially encoded data for temporal stream encoding. The Spatio-Temporal Consistency Encoding module amalgamates the temporal structure of the temporally redundant part with its corresponding spatial arrangement, thereby enhancing recognition accuracy. We have done numerous experiments to verify the effectiveness of our network. The experimental results demonstrate that our method achieves almost identical recognition accuracy while being nearly four times faster than other state-of-the-art methods.
- [140] arXiv:2405.06930 [pdf, ps, html, other]
-
Title: Extended Reality for Smart Built Environments Design: Smart Lighting Design TestbedJournal-ref: HCI International 2022Subjects: Human-Computer Interaction (cs.HC)
Smart Built Environment is an eco-system of `connected' and `smart' Internet of Things (IoT) devices that are embedded in a built environment. Smart lighting is an important category of smart IoT devices that has recently attracted research interest, particularly for residential areas. In this paper, we present an extended reality based smart lighting design testbed that can generate design prototypes based on the functionality of the physical environment. The emphasis is on designing a smart lighting system in a controlled residential environment, with some evaluation of well-being and comfort.
- [141] arXiv:2405.06931 [pdf, ps, html, other]
-
Title: Identifying Key Terms in Prompts for Relevance Evaluation with GPT ModelsComments: 19pages, 2 figuresJournal-ref: International Journal of Natural Language Computing, April 2024, Volume 13, Number 2Subjects: Information Retrieval (cs.IR)
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4, demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with LLMs. We employed two types of prompts: those used in previous research and generated automatically by LLMs. By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts. We have observed two main findings from our study. First, we discovered that prompts using the term answerlead to more effective relevance evaluations than those using relevant. This indicates that a more direct approach, focusing on answering the query, tends to enhance performance. Second, we noted the importance of appropriately balancing the scope of relevance. While the term relevant can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments. The inclusion of few-shot examples helps in more precisely defining this balance. By providing clearer contexts for the term relevance, few-shot examples contribute to refine relevance criteria. In conclusion, our study highlights the significance of carefully selecting terms in prompts for relevance evaluation with LLMs.
- [142] arXiv:2405.06932 [pdf, ps, html, other]
-
Title: Piccolo2: General Text Embedding with Multi-task Hybrid Loss TrainingComments: tech reportSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In this report, we introduce Piccolo2, an embedding model that surpasses other models in the comprehensive evaluation over 6 tasks on CMTEB benchmark, setting a new state-of-the-art. Piccolo2 primarily leverages an efficient multi-task hybrid loss training approach, effectively harnessing textual data and labels from diverse downstream tasks. In addition, Piccolo2 scales up the embedding dimension and uses MRL training to support more flexible vector dimensions. The latest information of piccolo models can be accessed via: this https URL
- [143] arXiv:2405.06933 [pdf, ps, other]
-
Title: Syndrome-based Fusion Rules in Heterogeneous Distributed Quickest Change DetectionSubjects: Information Theory (cs.IT)
In this paper, the heterogeneous distributed quickest change detection (HetDQCD) with 1-bit non-anonymous feedback is studied. The concept of syndromes is introduced and the family of syndrome-based fusion rules is proposed, which encompasses all deterministic fusion rules as special cases. Through the Hasse diagram of syndromes, upper and lower bounds on the second-order performance of expected detection delay as a function of average run length to false alarm are provided. An interesting instance, the weighted voting rule previously proposed in our prior work, is then revisited, for which an efficient pruning method for breadth-first search in the Hasse diagram is proposed to analyze the performance. This in turn assists in the design of the weight threshold in the weighted voting rule. Simulation results corroborate that our analysis is instrumental in identifying a proper design for the weighted voting rule, demonstrating consistent superiority over both the anonymous voting rule and the group selection rule in HetDQCD.
- [144] arXiv:2405.06937 [pdf, ps, other]
-
Title: High-Order Synchrosqueezed Chirplet Transforms for Multicomponent Signal AnalysisSubjects: Numerical Analysis (math.NA); Signal Processing (eess.SP)
This study focuses on the analysis of signals containing multiple components with crossover instantaneous frequencies (IF). This problem was initially solved with the chirplet transform (CT). Also, it can be sharpened by adding the synchrosqueezing step, which is called the synchrosqueezed chirplet transform (SCT). However, we found that the SCT goes wrong with the high chirp modulation signal due to the wrong estimation of the IF. In this paper, we present the improvement of the post-transformation of the CT. The main goal of this paper is to amend the estimation introduced in the SCT and carry out the high-order synchrosqueezed chirplet transform. The proposed method reduces the wrong estimation when facing a stronger variety of chirp-modulated multi-component signals. The theoretical analysis of the new reassignment ingredient is provided. Numerical experiments on some synthetic signals are presented to verify the effectiveness of the proposed high-order SCT.
- [145] arXiv:2405.06944 [pdf, ps, html, other]
-
Title: Learning Monocular Depth from Focus with Event Focal StackSubjects: Computer Vision and Pattern Recognition (cs.CV)
Depth from Focus estimates depth by determining the moment of maximum focus from multiple shots at different focal distances, i.e. the Focal Stack. However, the limited sampling rate of conventional optical cameras makes it difficult to obtain sufficient focus cues during the focal sweep. Inspired by biological vision, the event camera records intensity changes over time in extremely low latency, which provides more temporal information for focus time acquisition. In this study, we propose the EDFF Network to estimate sparse depth from the Event Focal Stack. Specifically, we utilize the event voxel grid to encode intensity change information and project event time surface into the depth domain to preserve per-pixel focal distance information. A Focal-Distance-guided Cross-Modal Attention Module is presented to fuse the information mentioned above. Additionally, we propose a Multi-level Depth Fusion Block designed to integrate results from each level of a UNet-like architecture and produce the final output. Extensive experiments validate that our method outperforms existing state-of-the-art approaches.
- [146] arXiv:2405.06945 [pdf, ps, other]
-
Title: Direct Learning of Mesh and Appearance via 3D Gaussian SplattingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Accurately reconstructing a 3D scene including explicit geometry information is both attractive and challenging. Geometry reconstruction can benefit from incorporating differentiable appearance models, such as Neural Radiance Fields and 3D Gaussian Splatting (3DGS). In this work, we propose a learnable scene model that incorporates 3DGS with an explicit geometry representation, namely a mesh. Our model learns the mesh and appearance in an end-to-end manner, where we bind 3D Gaussians to the mesh faces and perform differentiable rendering of 3DGS to obtain photometric supervision. The model creates an effective information pathway to supervise the learning of the scene, including the mesh. Experimental results demonstrate that the learned scene model not only achieves state-of-the-art rendering quality but also supports manipulation using the explicit mesh. In addition, our model has a unique advantage in adapting to scene updates, thanks to the end-to-end learning of both mesh and appearance.
- [147] arXiv:2405.06948 [pdf, ps, other]
-
Title: Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image GenerationComments: 26 pages, 13 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these limitations, we propose a subject-driven generation framework and introduce training-free guidance to intervene in the generative process during inference time. This approach strengthens the attention map, allowing for precise attribute binding and feature injection for each subject. Notably, our method exhibits exceptional zero-shot generation ability, especially in the challenging task of compositional generation. Furthermore, we propose a novel metric GroundingScore to evaluate subject alignment thoroughly. The obtained quantitative results serve as compelling evidence showcasing the effectiveness of our proposed method. The code will be released soon.
- [148] arXiv:2405.06951 [pdf, ps, html, other]
-
Title: Intelligent Reflecting Surface-Aided Radar SpoofingComments: 5 pages, 4 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Electronic countermeasure (ECM) technology plays a critical role in modern electronic warfare, which can interfere with enemy radar detection systems by noise or deceptive signals. However, the conventional active jamming strategy incurs additional hardware and power costs and has the potential threat of exposing the target itself. To tackle the above challenges, we propose a new intelligent reflecting surface (IRS)-aided radar spoofing strategy in this letter, where IRS is deployed on the surface of a target to help eliminate the signals reflected towards the hostile radar to shield the target, while simultaneously redirecting its reflected signal towards a surrounding clutter to generate deceptive angle-of-arrival (AoA) sensing information for the radar. We optimize the IRS's reflection to maximize the received signal power at the radar from the direction of the selected clutter subject to the constraint that its received power from the direction of the target is lower than a given detection threshold. We first solve this non-convex optimization problem using the semidefinite relaxation (SDR) method and further propose a lower-complexity solution for real-time implementation. Simulation results validate the efficacy of our proposed IRS-aided spoofing system as compared to various benchmark schemes.
- [149] arXiv:2405.06954 [pdf, ps, other]
-
Title: A Convergence Theorem for the Parareal Algorithm RevisitedComments: 14 pagesSubjects: Numerical Analysis (math.NA)
The subject of the paper is to verify the convergence conditions for the parareal algorithm using Gander and Hairer's theorem . The analysis is conducted in the case where the coarse integrator is the Euler method and the high-accuracy integrator is an explicit Runge-Kutta type method.
- [150] arXiv:2405.06957 [pdf, ps, html, other]
-
Title: On the Role of Intelligence and Business Wargaming in Developing ForesightComments: 18 pages, 3 figures. The authors declare no potential conflicts of interestSubjects: Computers and Society (cs.CY)
Business wargaming is a central tool for developing sustaining strategies. It transfers the benefits of traditional wargaming to the business environment. However, building wargames that support the process of decision-making for strategy require respective intelligence. This paper investigates the role of intelligence in the process of developing strategic foresight. The focus is on how intelligence is developed and how it relates to business wargaming. The so-called intelligence cycle is the basis and reference of our investigation.
The conceptual part of the paper combines the theoretical background from military, business as well as serious gaming. To elaborate on some of the lessons learned, we examine specific business wargames both drawn from the literature and conducted by us at the Center for Intelligence and Security Studies (CISS). It is shown that business wargaming can make a significant contribution to the transformation of data to intelligence by supporting the intelligence cycle in two crucial phases. Furthermore, it brings together business intelligence (BI) and competitive intelligence (CI) and it bridges the gap to a company's strategy by either testing or developing a new strategy. We were also able to confirm this finding based on the business wargame we conducted at a major semiconductor manufacturer. - [151] arXiv:2405.06959 [pdf, ps, other]
-
Title: AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose EstimationComments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA),7 pages, 3 figuresSubjects: Robotics (cs.RO)
To address the limitations inherent to conventional automated harvesting robots specifically their suboptimal success rates and risk of crop damage, we design a novel bot named AHPPEBot which is capable of autonomous harvesting based on crop phenotyping and pose estimation. Specifically, In phenotyping, the detection, association, and maturity estimation of tomato trusses and individual fruits are accomplished through a multi-task YOLOv5 model coupled with a detection-based adaptive DBScan clustering algorithm. In pose estimation, we employ a deep learning model to predict seven semantic keypoints on the pedicel. These keypoints assist in the robot's path planning, minimize target contact, and facilitate the use of our specialized end effector for harvesting. In autonomous tomato harvesting experiments conducted in commercial greenhouses, our proposed robot achieved a harvesting success rate of 86.67%, with an average successful harvest time of 32.46 s, showcasing its continuous and robust harvesting capabilities. The result underscores the potential of harvesting robots to bridge the labor gap in agriculture.
- [152] arXiv:2405.06964 [pdf, ps, other]
-
Title: ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and RobotsZhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianyi Chen, Zhouliang Yu, Lin ShaoSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to develop a foundation model for general robotic manipulation that formalizes a manipulation task as contact synthesis. Specifically, our model takes as input object and robot manipulator point clouds, object physical attributes, target motions, and manipulation region masks. It outputs contact points on the object and associated contact forces or post-contact motions for robots to achieve the desired manipulation task. We perform extensive experiments both in the simulation and real-world settings, manipulating articulated rigid objects, rigid objects, and deformable objects that vary in dimensionality, ranging from one-dimensional objects like ropes to two-dimensional objects like cloth and extending to three-dimensional objects such as plasticine. Our model achieves average success rates of around 90\%. Supplementary materials and videos are available on our project website at this https URL.
- [153] arXiv:2405.06965 [pdf, ps, other]
-
Title: A De-singularity Subgradient Approach for the Extended Weber Location ProblemComments: IJCAI 2024Subjects: Machine Learning (cs.LG)
The extended Weber location problem is a classical optimization problem that has inspired some new works in several machine learning scenarios recently. However, most existing algorithms may get stuck due to the singularity at the data points when the power of the cost function $1\leqslant q<2$, such as the widely-used iterative Weiszfeld approach. In this paper, we establish a de-singularity subgradient approach for this problem. We also provide a complete proof of convergence which has fixed some incomplete statements of the proofs for some previous Weiszfeld algorithms. Moreover, we deduce a new theoretical result of superlinear convergence for the iteration sequence in a special case where the minimum point is a singular point. We conduct extensive experiments in a real-world machine learning scenario to show that the proposed approach solves the singularity problem, produces the same results as in the non-singularity cases, and shows a reasonable rate of linear convergence. The results also indicate that the $q$-th power case ($1<q<2$) is more advantageous than the $1$-st power case and the $2$-nd power case in some situations. Hence the de-singularity subgradient approach is beneficial to advancing both theory and practice for the extended Weber location problem.
- [154] arXiv:2405.06967 [pdf, ps, other]
-
Title: Optimal Configuration of Reconfigurable Intelligent Surfaces With Non-uniform Phase QuantizationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The existing methods for Reconfigurable Intelligent Surface (RIS) beamforming in wireless communication are typically limited to uniform phase quantization. However, in real world applications, the phase and bit resolution of RIS units are often non-uniform due to practical requirements and engineering challenges. To fill this research gap, we formulate an optimization problem for discrete non-uniform phase configuration in RIS assisted multiple-input single-output (MISO) communications. Subsequently, a partition-and-traversal (PAT) algorithm is proposed to solve that, achieving the global optimal solution. The efficacy and superiority of the PAT algorithm are validated through numerical simulations, and the impact of non-uniform phase quantization on system performance is analyzed.
- [155] arXiv:2405.06971 [pdf, ps, other]
-
Title: Controlling network-coupled neural dynamics with nonlinear network control theorySubjects: Systems and Control (eess.SY)
This paper addresses the problem of controlling the temporal dynamics of complex nonlinear network-coupled dynamical systems, specifically in terms of neurodynamics. Based on the Lyapunov direct method, we derive a control strategy with theoretical guarantees of controllability. To verify the performance of the derived control strategy, we perform numerical experiments on two nonlinear network-coupled dynamical systems that emulate phase synchronization and neural population dynamics. The results demonstrate the feasibility and effectiveness of our control strategy.
- [156] arXiv:2405.06972 [pdf, ps, other]
-
Title: A Machine Learning-based Approach for Solving Recurrence Relations and its use in Cost Analysis of Logic ProgramsComments: arXiv admin note: text overlap with arXiv:2309.07259Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)
Automatic static cost analysis infers information about the resources used by programs without actually running them with concrete data, and presents such information as functions of input data sizes. Most of the analysis tools for logic programs (and many for other languages), as CiaoPP, are based on setting up recurrence relations representing (bounds on) the computational cost of predicates, and solving them to find closed-form functions. Such recurrence solving is a bottleneck in current tools: many of the recurrences that arise during the analysis cannot be solved with state-of-the-art solvers, including Computer Algebra Systems (CASs), so that specific methods for different classes of recurrences need to be developed. We address such a challenge by developing a novel, general approach for solving arbitrary, constrained recurrence relations, that uses machine-learning (sparse-linear and symbolic) regression techniques to guess a candidate closed-form function, and a combination of an SMT-solver and a CAS to check if it is actually a solution of the recurrence. Our prototype implementation and its experimental evaluation within the context of the CiaoPP system show quite promising results. Overall, for the considered benchmarks, our approach outperforms state-of-the-art cost analyzers and recurrence solvers, and solves recurrences that cannot be solved by them.
- [157] arXiv:2405.06973 [pdf, ps, html, other]
-
Title: A Primer for Preferential Non-Monotonic Propositional Team LogicsSubjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
This paper considers KLM-style preferential non-monotonic reasoning in the setting of propositional team semantics. We show that team-based propositional logics naturally give rise to cumulative non-monotonic entailment relations. Motivated by the non-classical interpretation of disjunction in team semantics, we give a precise characterization for preferential models for propositional dependence logic satisfying all of System P postulates. Furthermore, we show how classical entailment and dependence logic entailment can be expressed in terms of non-trivial preferential models.
- [158] arXiv:2405.06975 [pdf, ps, html, other]
-
Title: Input Snapshots Fusion for Scalable Discrete Dynamic Graph Nerual NetworksSubjects: Machine Learning (cs.LG)
Dynamic graphs are ubiquitous in the real world, yet there is a lack of suitable theoretical frameworks to effectively extend existing static graph models into the temporal domain. Additionally, for link prediction tasks on discrete dynamic graphs, the requirement of substantial GPU memory to store embeddings of all nodes hinders the scalability of existing models. In this paper, we introduce an Input {\bf S}napshots {\bf F}usion based {\bf Dy}namic {\bf G}raph Neural Network (SFDyG). By eliminating the partitioning of snapshots within the input window, we obtain a multi-graph (more than one edge between two nodes). Subsequently, by introducing a graph denoising problem with the assumption of temporal decayed smoothing, we integrate Hawkes process theory into Graph Neural Networks to model the generated multi-graph. Furthermore, based on the multi-graph, we propose a scalable three-step mini-batch training method and demonstrate its equivalence to full-batch training counterpart. Our experiments, conducted on eight distinct dynamic graph datasets for future link prediction tasks, revealed that SFDyG generally surpasses related methods.
- [159] arXiv:2405.06977 [pdf, ps, other]
-
Title: The Sample Complexity of Stackelberg GamesSubjects: Computer Science and Game Theory (cs.GT)
Stackelberg games (SGs) constitute the most fundamental and acclaimed models of strategic interactions involving some form of commitment. Moreover, they form the basis of more elaborate models of this kind, such as, e.g., Bayesian persuasion and principal-agent problems. Addressing learning tasks in SGs and related models is crucial to operationalize them in practice, where model parameters are usually unknown. In this paper, we revise the sample complexity of learning an optimal strategy to commit to in SGs. We provide a novel algorithm that (i) does not require any of the limiting assumptions made by state-of-the-art approaches and (ii) deals with a trade-off between sample complexity and termination probability arising when leader's strategies representation has finite precision. Such a trade-off has been completely neglected by existing algorithms and, if not properly managed, it may result in them using exponentially-many samples. Our algorithm requires novel techniques, which also pave the way to addressing learning problems in other models with commitment ubiquitous in the real world.
- [160] arXiv:2405.06978 [pdf, ps, html, other]
-
Title: On User Association in Large-Scale Heterogeneous LEO Satellite NetworkSubjects: Emerging Technologies (cs.ET); Networking and Internet Architecture (cs.NI)
In this paper, we investigate the performance of large-scale heterogeneous low Earth orbit (LEO) satellite networks in the context of three association schemes. In contrast to existing studies, where single-tier LEO satellite-based network deployments are considered, the developed framework captures the heterogeneous nature of real-world satellite network deployments. More specifically, we propose an analytical framework to evaluate the performance of multi-tier LEO satellite-based networks, where the locations of LEO satellites are approximated as points of independent Poisson point processes, with different density, transmit power, and altitude. We propose three association schemes for the considered network topology based on: 1) the Euclidean distance, 2) the average received power, and 3) a random selection. By using stochastic geometry tools, analytical expressions for the association probability, the downlink coverage probability, as well as the spectral efficiency are derived for each association scheme, where the interference is considered. Moreover, we assess the achieved network performance under several different fading environments, including low, typical, and severe fading conditions, namely non-fading, shadowed-Rician and Rayleigh fading channels, respectively. Our results reveal the impact of fading channels on the coverage probability, and illustrate that the average power-based association scheme outperforms in terms of achieved coverage and spectral efficiency performance against the other two association policies. Furthermore, we highlight the impact of the proposed association schemes and the network topology on the optimal number of LEO satellites, providing guidance for the planning of multi-tier LEO satellite-based networks in order to enhance network performance.
- [161] arXiv:2405.06979 [pdf, ps, other]
-
Title: Robust Semi-supervised Learning by Wisely Leveraging Open-set DataSubjects: Machine Learning (cs.LG)
Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.
- [162] arXiv:2405.06980 [pdf, ps, other]
-
Title: Fractals as Pre-training Datasets for Anomaly Detection and LocalizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Anomaly detection is crucial in large-scale industrial manufacturing as it helps detect and localise defective parts. Pre-training feature extractors on large-scale datasets is a popular approach for this task. Stringent data security and privacy regulations and high costs and acquisition time hinder the availability and creation of such large datasets. While recent work in anomaly detection primarily focuses on the development of new methods built on such extractors, the importance of the data used for pre-training has not been studied. Therefore, we evaluated the performance of eight state-of-the-art methods pre-trained using dynamically generated fractal images on the famous benchmark datasets MVTec and VisA. In contrast to existing literature, which predominantly examines the transfer-learning capabilities of fractals, in this study, we compare models pre-trained with fractal images against those pre-trained with ImageNet, without subsequent fine-tuning. Although pre-training with ImageNet remains a clear winner, the results of fractals are promising considering that the anomaly detection task required features capable of discerning even minor visual variations. This opens up the possibility for a new research direction where feature extractors could be trained on synthetically generated abstract datasets reconciling the ever-increasing demand for data in machine learning while circumventing privacy and security concerns.
- [163] arXiv:2405.06981 [pdf, ps, other]
-
Title: AraSpell: A Deep Learning Approach for Arabic Spelling CorrectionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Spelling correction is the task of identifying spelling mistakes, typos, and grammatical mistakes in a given text and correcting them according to their context and grammatical structure. This work introduces "AraSpell," a framework for Arabic spelling correction using different seq2seq model architectures such as Recurrent Neural Network (RNN) and Transformer with artificial data generation for error injection, trained on more than 6.9 Million Arabic sentences. Thorough experimental studies provide empirical evidence of the effectiveness of the proposed approach, which achieved 4.8% and 1.11% word error rate (WER) and character error rate (CER), respectively, in comparison with labeled data of 29.72% WER and 5.03% CER. Our approach achieved 2.9% CER and 10.65% WER in comparison with labeled data of 10.02% CER and 50.94% WER. Both of these results are obtained on a test set of 100K sentences.
- [164] arXiv:2405.06983 [pdf, ps, other]
-
Title: ISAC-Assisted Wireless Rechargeable Sensor Networks with Multiple Mobile Charging VehiclesComments: Accepted for publication in the Special Issue Q1'2024, "Integrating Sensing and Communication for Ubiquitous Internet of Things," IEEE Internet of Things MagazineSubjects: Networking and Internet Architecture (cs.NI)
As IoT-based wireless sensor networks (WSNs) become more prevalent, the issue of energy shortages becomes more pressing. One potential solution is the use of wireless power transfer (WPT) technology, which is the key to building a new shape of wireless rechargeable sensor networks (WRSNs). However, efficient charging and scheduling are critical for WRSNs to function properly. Motivated by the fact that probabilistic techniques can help enhance the effectiveness of charging scheduling for WRSNs, this article addresses the aforementioned issue and proposes a novel ISAC-assisted WRSN protocol. In particular, our proposed protocol considers several factors to balance the charging load on each mobile charging vehicle (MCV), uses an efficient charging factor strategy to partially charge network devices, and employs the ISAC concept to reduce the traveling cost of each MCV and prevent charging conflicts. Simulation results demonstrate that this protocol outperforms other classic, cutting-edge protocols in multiple areas.
- [165] arXiv:2405.06985 [pdf, ps, other]
-
Title: RoTHP: Rotary Position Embedding-based Transformer Hawkes ProcessSubjects: Machine Learning (cs.LG)
Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process (THP) achieve distinct performance improvement. Although the THP has gained increasing studies, it still suffers from the {sequence prediction issue}, i.e., training on history sequences and inferencing about the future, which is a prevalent paradigm in realistic sequence analysis tasks. What's more, conventional THP and its variants simply adopt initial sinusoid embedding in transformers, which shows performance sensitivity to temporal change or noise in sequence data analysis by our empirical study. To deal with the problems, we propose a new Rotary Position Embedding-based THP (RoTHP) architecture in this paper. Notably, we show the translation invariance property and {sequence prediction flexibility} of our RoTHP induced by the {relative time embeddings} when coupled with Hawkes process theoretically. Furthermore, we demonstrate empirically that our RoTHP can be better generalized in sequence data scenarios with timestamp translations and in sequence prediction tasks.
- [166] arXiv:2405.06986 [pdf, ps, other]
-
Title: Revisiting the Efficacy of Signal Decomposition in AI-based Time Series PredictionSubjects: Machine Learning (cs.LG)
Time series prediction is a fundamental problem in scientific exploration and artificial intelligence (AI) technologies have substantially bolstered its efficiency and accuracy. A well-established paradigm in AI-driven time series prediction is injecting physical knowledge into neural networks through signal decomposition methods, and sustaining progress in numerous scenarios has been reported. However, we uncover non-negligible evidence that challenges the effectiveness of signal decomposition in AI-based time series prediction. We confirm that improper dataset processing with subtle future label leakage is unfortunately widely adopted, possibly yielding abnormally superior but misleading results. By processing data in a strictly causal way without any future information, the effectiveness of additional decomposed signals diminishes. Our work probably identifies an ingrained and universal error in time series modeling, and the de facto progress in relevant areas is expected to be revisited and calibrated to prevent future scientific detours and minimize practical losses.
- [167] arXiv:2405.06989 [pdf, ps, other]
-
Title: Stabilizing Circular Motion Within Nonconcentric Circular Boundary: A Mobius Transformation-Based ApproachSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Nonuniform motion constraints are ubiquitous in robotic applications. Geofencing control is one such paradigm where the motion of a robot must be constrained within a predefined boundary. This paper addresses the problem of stabilizing a unicycle robot around a desired circular orbit while confining its motion within a nonconcentric external circular boundary. Our solution approach relies on the concept of the so-called Mobius transformation that, under certain practical conditions, maps two nonconcentric circles to a pair of concentric circles, and hence, results in uniform spatial motion constraints. The choice of such a Mobius transformation is governed by the roots of a quadratic equation in the post-design analysis that decides how the regions enclosed by the two circles are mapped onto the two planes. We show that the problem can be formulated either as a trajectory-constraining problem or an obstacle-avoidance problem in the transformed plane, depending on these roots. Exploiting the idea of the barrier Lyapunov function, we propose a unique control law that solves both these contrasting problems in the transformed plane and renders a solution to the original problem in the actual plane. By relating parameters of two planes under Mobius transformation and its inverse map, we further establish a connection between the control laws in two planes and determine the control law to be applied in the actual plane. Simulation and experimental results are provided to illustrate the key theoretical developments.
- [168] arXiv:2405.06991 [pdf, ps, other]
-
Title: PIPE: Process Informed Parameter Estimation, a learning based approach to task generalized system identificationComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Robotics (cs.RO)
We address the problem of robot guided assembly tasks, by using a learning-based approach to identify contact model parameters for known and novel parts. First, a Variational Autoencoder (VAE) is used to extract geometric features of assembly parts. Then, we combine the extracted features with physical knowledge to derive the parameters of a contact model using our newly proposed neural network structure. The measured force from real experiments is used to supervise the predicted forces, thus avoiding the need for ground truth model parameters. Although trained only on a small set of assembly parts, good contact model estimation for unknown objects were achieved. Our main contribution is the network structure that allows us to estimate contact models of assembly tasks depending on the geometry of the part to be joined. Where current system identification processes have to record new data for a new assembly process, our method only requires the 3D model of the assembly part. We evaluate our method by estimating contact models for robot-guided assembly tasks of pin connectors as well as electronic plugs and compare the results with real experiments.
- [169] arXiv:2405.06992 [pdf, ps, other]
-
Title: ResSurv: Cancer Survival Analysis Prediction Model Based on Residual NetworksComments: 7pages, 7figuresSubjects: Machine Learning (cs.LG); Applications (stat.AP)
Survival prediction is an important branch of cancer prognosis analysis. The model that predicts survival risk through TCGA genomics data can discover genes related to cancer and provide diagnosis and treatment recommendations based on patient characteristics. We found that deep learning models based on Cox proportional hazards often suffer from overfitting when dealing with high-throughput data. Moreover, we found that as the number of network layers increases, the experimental results will not get better, and network degradation will occur. Based on this problem, we propose a new framework based on Deep Residual Learning. Combine the ideas of Cox proportional hazards and Residual. And name it ResSurv. First, ResSurv is a feed-forward deep learning network stacked by multiple basic ResNet Blocks. In each ResNet Block, we add a Normalization Layer to prevent gradient disappearance and gradient explosion. Secondly, for the loss function of the neural network, we inherited the Cox proportional hazards methods, applied the semi-parametric of the CPH model to the neural network, combined with the partial likelihood model, established the loss function, and performed backpropagation and gradient update. Finally, we compared ResSurv networks of different depths and found that we can effectively extract high-dimensional features. Ablation experiments and comparative experiments prove that our model has reached SOTA(state of the art) in the field of deep learning, and our network can effectively extract deep information.
- [170] arXiv:2405.06993 [pdf, ps, other]
-
Title: Robust Model Aggregation for Heterogeneous Federated Learning: Analysis and OptimizationsSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation incurs a new problem of inconsistency between local updates and global updates. Motivated by the issues of conventional SFL and AFL, we first propose a time-driven SFL (T-SFL) framework for heterogeneous systems. The core idea of T-SFL is that the server aggregates the models from different clients, each with varying numbers of iterations, at regular time intervals. To evaluate the learning performance of T-SFL, we provide an upper bound on the global loss function. Further, we optimize the aggregation weights to minimize the developed upper bound. Then, we develop a discriminative model selection (DMS) algorithm that removes local models from clients whose number of iterations falls below a predetermined threshold. In particular, this algorithm ensures that each client's aggregation weight accurately reflects its true contribution to the global model update, thereby improving the efficiency and robustness of the system. To validate the effectiveness of T-SFL with the DMS algorithm, we conduct extensive experiments using several popular datasets including MNIST, Cifar-10, Fashion-MNIST, and SVHN. The experimental results demonstrate that T-SFL with the DMS algorithm can reduce the latency of conventional SFL by 50\%, while achieving an average 3\% improvement in learning accuracy over state-of-the-art AFL algorithms.
- [171] arXiv:2405.06994 [pdf, ps, other]
-
Title: GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution ShiftsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Neural Architecture Search (NAS) methods have shown to output networks that largely outperform human-designed networks. However, conventional NAS methods have mostly tackled the single dataset scenario, incuring in a large computational cost as the procedure has to be run from scratch for every new dataset. In this work, we focus on predictor-based algorithms and propose a simple and efficient way of improving their prediction performance when dealing with data distribution shifts. We exploit the Kronecker-product on the randomly wired search-space and create a small NAS benchmark composed of networks trained over four different datasets. To improve the generalization abilities, we propose GRASP-GCN, a ranking Graph Convolutional Network that takes as additional input the shape of the layers of the neural networks. GRASP-GCN is trained with the not-at-convergence accuracies, and improves the state-of-the-art of 3.3 % for Cifar-10 and increasing moreover the generalization abilities under data distribution shift.
- [172] arXiv:2405.06995 [pdf, ps, other]
-
Title: Benchmarking Cross-Domain Audio-Visual Deception DetectionComments: 10 pagesSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection. Protocols and source code are available at \href{this https URL}{this https URL\_domain\_DD}.
- [173] arXiv:2405.06996 [pdf, ps, other]
-
Title: Quite Good, but Not Enough: Nationality Bias in Large Language Models -- A Case Study of ChatGPTComments: Accepted by LREC-COLING 2024Subjects: Computation and Language (cs.CL)
While nationality is a pivotal demographic element that enhances the performance of language models, it has received far less scrutiny regarding inherent biases. This study investigates nationality bias in ChatGPT (GPT-3.5), a large language model (LLM) designed for text generation. The research covers 195 countries, 4 temperature settings, and 3 distinct prompt types, generating 4,680 discourses about nationality descriptions in Chinese and English. Automated metrics were used to analyze the nationality bias, and expert annotators alongside ChatGPT itself evaluated the perceived bias. The results show that ChatGPT's generated discourses are predominantly positive, especially compared to its predecessor, GPT-2. However, when prompted with negative inclinations, it occasionally produces negative content. Despite ChatGPT considering its generated text as neutral, it shows consistent self-awareness about nationality bias when subjected to the same pair-wise comparison annotation framework used by human annotators. In conclusion, while ChatGPT's generated texts seem friendly and positive, they reflect the inherent nationality biases in the real world. This bias may vary across different language versions of ChatGPT, indicating diverse cultural perspectives. The study highlights the subtle and pervasive nature of biases within LLMs, emphasizing the need for further scrutiny.
- [174] arXiv:2405.06997 [pdf, ps, other]
-
Title: Path Guiding for Wavefront Path Tracing: A Memory Efficient Approach for GPU Path TracersBora Yalçıner (1), Ahmet Oğuz Akyüz (1) ((1) Middle East Technical University, Computer Engineering Department, Ankara, Turkey)Comments: 14 pages, 11 figuresSubjects: Graphics (cs.GR)
We propose a path-guiding algorithm to be incorporated into the wavefront style of path tracers (WFPTs). As WFPTs are primarily implemented on graphics processing units (GPUs), the proposed method aims to leverage the capabilities of the GPUs and reduce the hierarchical data structure and memory usage typically required for such techniques. To achieve this, our algorithm only stores the radiant exitance on a single global sparse voxel octree (SVO) data structure. Probability density functions required to guide the rays are generated on-the-fly using this data structure. The proposed approach reduces the scene-related persistent memory requirements compared to other path-guiding techniques while producing similar or better results depending on scene characteristics. To our knowledge, our algorithm is the first one that incorporates path guiding into a WFPT.
- [175] arXiv:2405.06999 [pdf, ps, html, other]
-
Title: Large Language Model-aided Edge Learning in Distribution System State EstimationSubjects: Systems and Control (eess.SY)
Distribution system state estimation (DSSE) plays a crucial role in the real-time monitoring, control, and operation of distribution networks. Besides intensive computational requirements, conventional DSSE methods need high-quality measurements to obtain accurate states, whereas missing values often occur due to sensor failures or communication delays. To address these challenging issues, a forecast-then-estimate framework of edge learning is proposed for DSSE, leveraging large language models (LLMs) to forecast missing measurements and provide pseudo-measurements. Firstly, natural language-based prompts and measurement sequences are integrated by the proposed LLM to learn patterns from historical data and provide accurate forecasting results. Secondly, a convolutional layer-based neural network model is introduced to improve the robustness of state estimation under missing measurement. Thirdly, to alleviate the overfitting of the deep learning-based DSSE, it is reformulated as a multi-task learning framework containing shared and task-specific layers. The uncertainty weighting algorithm is applied to find the optimal weights to balance different tasks. The numerical simulation on the Simbench case is used to demonstrate the effectiveness of the proposed forecast-then-estimate framework.
- [176] arXiv:2405.07001 [pdf, ps, other]
-
Title: Evaluating Task-based Effectiveness of MLLMs on ChartsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we explore a forward-thinking question: Is GPT-4V effective at low-level data analysis tasks on charts? To this end, we first curate a large-scale dataset, named ChartInsights, consisting of 89,388 quartets (chart, task, question, answer) and covering 10 widely-used low-level data analysis tasks on 7 chart types. Firstly, we conduct systematic evaluations to understand the capabilities and limitations of 18 advanced MLLMs, which include 12 open-source models and 6 closed-source models. Starting with a standard textual prompt approach, the average accuracy rate across the 18 MLLMs is 36.17%. Among all the models, GPT-4V achieves the highest accuracy, reaching 56.13%. To understand the limitations of multimodal large models in low-level data analysis tasks, we have designed various experiments to conduct an in-depth test of capabilities of GPT-4V. We further investigate how visual modifications to charts, such as altering visual elements (e.g. changing color schemes) and introducing perturbations (e.g. adding image noise), affect performance of GPT-4V. Secondly, we present 12 experimental findings. These findings suggest potential of GPT-4V to revolutionize interaction with charts and uncover the gap between human analytic needs and capabilities of GPT-4V. Thirdly, we propose a novel textual prompt strategy, named Chain-of-Charts, tailored for low-level analysis tasks, which boosts model performance by 24.36%, resulting in an accuracy of 80.49%. Furthermore, by incorporating a visual prompt strategy that directs attention of GPT-4V to question-relevant visual elements, we further improve accuracy to 83.83%. Our study not only sheds light on the capabilities and limitations of GPT-4V in low-level data analysis tasks but also offers valuable insights for future research.
- [177] arXiv:2405.07004 [pdf, ps, other]
-
Title: Stealthy Imitation: Reward-guided Environment-free Policy StealingComments: Accepted at ICML 2024. Project page: this https URLSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box access. In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. This setup has not been considered by previous model stealing methods. Lacking access to the victim's input states distribution, Stealthy Imitation fits a reward model that allows to approximate it. We show that the victim policy is harder to imitate when the distribution of the attack queries matches that of the victim. We evaluate our approach across diverse, high-dimensional control tasks and consistently outperform prior data-free approaches adapted for policy stealing. Lastly, we propose a countermeasure that significantly diminishes the effectiveness of the attack.
- [178] arXiv:2405.07006 [pdf, ps, html, other]
-
Title: Word-specific tonal realizations in MandarinSubjects: Computation and Language (cs.CL)
The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' meanings. We first show, on the basis of a Taiwan corpus of spontaneous conversations, using the generalized additive regression model, and focusing on the rise-fall tone pattern, that after controlling for effects of speaker and context, word type is a stronger predictor of pitch realization than all the previously established word-form related predictors combined. Importantly, the addition of information about meaning in context improves prediction accuracy even further. We then proceed to show, using computational modeling with context-specific word embeddings, that token-specific pitch contours predict word type with 50% accuracy on held-out data, and that context-sensitive, token-specific embeddings can predict the shape of pitch contours with 30% accuracy. These accuracies, which are an order of magnitude above chance level, suggest that the relation between words' pitch contours and their meanings are sufficiently strong to be functional for language users. The theoretical implications of these empirical findings are discussed.
- [179] arXiv:2405.07007 [pdf, ps, html, other]
-
Title: A New Algorithm for Computing Branch Number of Non-Singular Matrices over Finite FieldsSubjects: Cryptography and Security (cs.CR)
The notion of branch numbers of a linear transformation is crucial for both linear and differential cryptanalysis. The number of non-zero elements in a state difference or linear mask directly correlates with the active S-Boxes. The differential or linear branch number indicates the minimum number of active S-Boxes in two consecutive rounds of an SPN cipher, specifically for differential or linear cryptanalysis, respectively. This paper presents a new algorithm for computing the branch number of non-singular matrices over finite fields. The algorithm is based on the existing classical method but demonstrates improved computational complexity compared to its predecessor. We conduct a comparative study of the proposed algorithm and the classical approach, providing an analytical estimation of the algorithm's complexity. Our analysis reveals that the computational complexity of our algorithm is the square root of that of the classical approach.
- [180] arXiv:2405.07010 [pdf, ps, html, other]
-
Title: Deciphering public attention to geoengineering and climate issues using machine learning and dynamic analysisComments: 46 page, 6 main figures and SISubjects: Computers and Society (cs.CY); Computation and Language (cs.CL)
As the conversation around using geoengineering to combat climate change intensifies, it is imperative to engage the public and deeply understand their perspectives on geoengineering research, development, and potential deployment. Through a comprehensive data-driven investigation, this paper explores the types of news that captivate public interest in geoengineering. We delved into 30,773 English-language news articles from the BBC and the New York Times, combined with Google Trends data spanning 2018 to 2022, to explore how public interest in geoengineering fluctuates in response to news coverage of broader climate issues. Using BERT-based topic modeling, sentiment analysis, and time-series regression models, we found that positive sentiment in energy-related news serves as a good predictor of heightened public interest in geoengineering, a trend that persists over time. Our findings suggest that public engagement with geoengineering and climate action is not uniform, with some topics being more potent in shaping interest over time, such as climate news related to energy, disasters, and politics. Understanding these patterns is crucial for scientists, policymakers, and educators aiming to craft effective strategies for engaging with the public and fostering dialogue around emerging climate technologies.
- [181] arXiv:2405.07011 [pdf, ps, other]
-
Title: Fair Graph Representation Learning via Sensitive Attribute DisentanglementComments: Accepted by WWW 2024Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Group fairness for Graph Neural Networks (GNNs), which emphasizes algorithmic decisions neither favoring nor harming certain groups defined by sensitive attributes (e.g., race and gender), has gained considerable attention. In particular, the objective of group fairness is to ensure that the decisions made by GNNs are independent of the sensitive attribute. To achieve this objective, most existing approaches involve eliminating sensitive attribute information in node representations or algorithmic decisions. However, such ways may also eliminate task-related information due to its inherent correlation with the sensitive attribute, leading to a sacrifice in utility. In this work, we focus on improving the fairness of GNNs while preserving task-related information and propose a fair GNN framework named FairSAD. Instead of eliminating sensitive attribute information, FairSAD enhances the fairness of GNNs via Sensitive Attribute Disentanglement (SAD), which separates the sensitive attribute-related information into an independent component to mitigate its impact. Additionally, FairSAD utilizes a channel masking mechanism to adaptively identify the sensitive attribute-related component and subsequently decorrelates it. Overall, FairSAD minimizes the impact of the sensitive attribute on GNN outcomes rather than eliminating sensitive attributes, thereby preserving task-related information associated with the sensitive attribute. Furthermore, experiments conducted on several real-world datasets demonstrate that FairSAD outperforms other state-of-the-art methods by a significant margin in terms of both fairness and utility performance. Our source code is available at this https URL.
- [182] arXiv:2405.07012 [pdf, ps, other]
-
Title: Incorporating Degradation Estimation in Light Field Spatial Super-ResolutionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advancements in light field super-resolution (SR) have yielded impressive results. In practice, however, many existing methods are limited by assuming fixed degradation models, such as bicubic downsampling, which hinders their robustness in real-world scenarios with complex degradations. To address this limitation, we present LF-DEST, an effective blind Light Field SR method that incorporates explicit Degradation Estimation to handle various degradation types. LF-DEST consists of two primary components: degradation estimation and light field restoration. The former concurrently estimates blur kernels and noise maps from low-resolution degraded light fields, while the latter generates super-resolved light fields based on the estimated degradations. Notably, we introduce a modulated and selective fusion module that intelligently combines degradation representations with image information, allowing for effective handling of diverse degradation types. We conduct extensive experiments on benchmark datasets, demonstrating that LF-DEST achieves superior performance across a variety of degradation scenarios in light field SR.
- [183] arXiv:2405.07017 [pdf, ps, html, other]
-
Title: Robot Agnostic Visual Servoing considering kinematic constraints enabled by a decoupled network trajectory planner structureComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Robotics (cs.RO)
We propose a visual servoing method consisting of a detection network and a velocity trajectory planner. First, the detection network estimates the objects position and orientation in the image space. Furthermore, these are normalized and filtered. The direction and orientation is then the input to the trajectory planner, which considers the kinematic constrains of the used robotic system. This allows safe and stable control, since the kinematic boundary values are taken into account in planning. Also, by having direction estimation and velocity planner separated, the learning part of the method does not directly influence the control value. This also enables the transfer of the method to different robotic systems without retraining, therefore being robot agnostic. We evaluate our method on different visual servoing tasks with and without clutter on two different robotic systems. Our method achieved mean absolute position errors of <0.5 mm and orientation errors of <1°. Additionally, we transferred the method to a new system which differs in robot and camera, emphasizing robot agnostic capability of our method.
- [184] arXiv:2405.07018 [pdf, ps, html, other]
-
Title: Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You ThoughtXiaoxiao Chi, Xuyun Zhang, Yan Wang, Lianyong Qi, Amin Beheshti, Xiaolong Xu, Kim-Kwang Raymond Choo, Shuo Wang, Hongsheng HuComments: This paper has been accepted by IJCAI-24Subjects: Cryptography and Security (cs.CR)
Recommender systems have been successfully applied in many applications. Nonetheless, recent studies demonstrate that recommender systems are vulnerable to membership inference attacks (MIAs), leading to the leakage of users' membership privacy. However, existing MIAs relying on shadow training suffer a large performance drop when the attacker lacks knowledge of the training data distribution and the model architecture of the target recommender system. To better understand the privacy risks of recommender systems, we propose shadow-free MIAs that directly leverage a user's recommendations for membership inference. Without shadow training, the proposed attack can conduct MIAs efficiently and effectively under a practice scenario where the attacker is given only black-box access to the target recommender system. The proposed attack leverages an intuition that the recommender system personalizes a user's recommendations if his historical interactions are used by it. Thus, an attacker can infer membership privacy by determining whether the recommendations are more similar to the interactions or the general popular items. We conduct extensive experiments on benchmark datasets across various recommender systems. Remarkably, our attack achieves far better attack accuracy with low false positive rates than baselines while with a much lower computational cost.
- [185] arXiv:2405.07020 [pdf, ps, html, other]
-
Title: Adaptive Online Bayesian Estimation of Frequency Distributions with Local Differential PrivacyComments: Code for experiments available at this https URLSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
We propose a novel Bayesian approach for the adaptive and online estimation of the frequency distribution of a finite number of categories under the local differential privacy (LDP) framework. The proposed algorithm performs Bayesian parameter estimation via posterior sampling and adapts the randomization mechanism for LDP based on the obtained posterior samples. We propose a randomized mechanism for LDP which uses a subset of categories as an input and whose performance depends on the selected subset and the true frequency distribution. By using the posterior sample as an estimate of the frequency distribution, the algorithm performs a computationally tractable subset selection step to maximize the utility of the privatized response of the next user. We propose several utility functions related to well-known information metrics, such as (but not limited to) Fisher information matrix, total variation distance, and information entropy. We compare each of these utility metrics in terms of their computational complexity. We employ stochastic gradient Langevin dynamics for posterior sampling, a computationally efficient approximate Markov chain Monte Carlo method. We provide a theoretical analysis showing that (i) the posterior distribution targeted by the algorithm converges to the true parameter even for approximate posterior sampling, and (ii) the algorithm selects the optimal subset with high probability if posterior sampling is performed exactly. We also provide numerical results that empirically demonstrate the estimation accuracy of our algorithm where we compare it with nonadaptive and semi-adaptive approaches under experimental settings with various combinations of privacy parameters and population distribution parameters.
- [186] arXiv:2405.07022 [pdf, ps, html, other]
-
Title: DTMamba : Dual Twin Mamba for Time Series ForecastingSubjects: Machine Learning (cs.LG); Databases (cs.DB)
We utilized the Mamba model for time series data prediction tasks, and the experimental results indicate that our model performs well.
- [187] arXiv:2405.07024 [pdf, ps, other]
-
Title: Demystifying the Hypercomplex: Inductive Biases in Hypercomplex Deep LearningComments: Accepted for Publication in IEEE Signal Processing MagazineSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Hypercomplex algebras have recently been gaining prominence in the field of deep learning owing to the advantages of their division algebras over real vector spaces and their superior results when dealing with multidimensional signals in real-world 3D and 4D paradigms. This paper provides a foundational framework that serves as a roadmap for understanding why hypercomplex deep learning methods are so successful and how their potential can be exploited. Such a theoretical framework is described in terms of inductive bias, i.e., a collection of assumptions, properties, and constraints that are built into training algorithms to guide their learning process toward more efficient and accurate solutions. We show that it is possible to derive specific inductive biases in the hypercomplex domains, which extend complex numbers to encompass diverse numbers and data structures. These biases prove effective in managing the distinctive properties of these domains, as well as the complex structures of multidimensional and multimodal signals. This novel perspective for hypercomplex deep learning promises to both demystify this class of methods and clarify their potential, under a unifying framework, and in this way promotes hypercomplex models as viable alternatives to traditional real-valued deep learning for multidimensional signal processing.
- [188] arXiv:2405.07027 [pdf, ps, other]
-
Title: TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field OptimizationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at this https URL.
- [189] arXiv:2405.07029 [pdf, ps, other]
-
Title: A framework of text-dependent speaker verification for chinese numerical string corpusComments: arXiv admin note: text overlap with arXiv:2312.01645Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impacted by reading rhythms and pauses. To address this problem, we propose an end-to-end speaker verification system that enhances TD-SV by decoupling speaker and text information. Our system consists of a text embedding extractor, a speaker embedding extractor and a fusion module. In the text embedding extractor, we employ an enhanced Transformer and introduce a triple loss including text classification loss, connectionist temporal classification (CTC) loss and decoder loss; while in the speaker embedding extractor, we create a multi-scale pooling method by combining sliding window attentive statistics pooling (SWASP) with attentive statistics pooling (ASP). To mitigate the scarcity of data, we have recorded a publicly available Chinese numerical corpus named SHALCAS22A (hereinafter called SHAL), which can be accessed on Open-SLR. Moreover, we employ data augmentation techniques using Tacotron2 and HiFi-GAN. Our method achieves an equal error rate (EER) performance improvement of 49.2% on Hi-Mia and 75.0% on SHAL, respectively.
- [190] arXiv:2405.07030 [pdf, ps, other]
-
Title: Lasso Ridge based XGBoost and Deep_LSTM Help Tennis Players Perform betterComments: 22 pages, 11 figuresSubjects: Machine Learning (cs.LG)
Understanding the dynamics of momentum and game fluctuation in tennis matches is cru-cial for predicting match outcomes and enhancing player performance. In this study, we present a comprehensive analysis of these factors using a dataset from the 2023 Wimbledon final. Ini-tially, we develop a sliding-window-based scoring model to assess player performance, ac-counting for the influence of serving dominance through a serve decay factor. Additionally, we introduce a novel approach, Lasso-Ridge-based XGBoost, to quantify momentum effects, lev-eraging the predictive power of XGBoost while mitigating overfitting through regularization. Through experimentation, we achieve an accuracy of 94% in predicting match outcomes, iden-tifying key factors influencing winning rates. Subsequently, we propose a Derivative of the winning rate algorithm to quantify game fluctuation, employing an LSTM_Deep model to pre-dict fluctuation scores. Our model effectively captures temporal correlations in momentum fea-tures, yielding mean squared errors ranging from 0.036 to 0.064. Furthermore, we explore me-ta-learning using MAML to transfer our model to predict outcomes in ping-pong matches, though results indicate a comparative performance decline. Our findings provide valuable in-sights into momentum dynamics and game fluctuation, offering implications for sports analytics and player training strategies.
- [191] arXiv:2405.07031 [pdf, ps, other]
-
Title: Global Motion Understanding in Large-Scale Video Object SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we show that transferring knowledge from other domains of video understanding combined with large-scale learning can improve robustness of Video Object Segmentation (VOS) under complex circumstances. Namely, we focus on integrating scene global motion knowledge to improve large-scale semi-supervised Video Object Segmentation. Prior works on VOS mostly rely on direct comparison of semantic and contextual features to perform dense matching between current and past frames, passing over actual motion structure. On the other hand, Optical Flow Estimation task aims to approximate the scene motion field, exposing global motion patterns which are typically undiscoverable during all pairs similarity search. We present WarpFormer, an architecture for semi-supervised Video Object Segmentation that exploits existing knowledge in motion understanding to conduct smoother propagation and more accurate matching. Our framework employs a generic pretrained Optical Flow Estimation network whose prediction is used to warp both past frames and instance segmentation masks to the current frame domain. Consequently, warped segmentation masks are refined and fused together aiming to inpaint occluded regions and eliminate artifacts caused by flow field imperfects. Additionally, we employ novel large-scale MOSE 2023 dataset to train model on various complex scenarios. Our method demonstrates strong performance on DAVIS 2016/2017 validation (93.0% and 85.9%), DAVIS 2017 test-dev (80.6%) and YouTube-VOS 2019 validation (83.8%) that is competitive with alternative state-of-the-art methods while using much simpler memory mechanism and instance understanding logic.
- [192] arXiv:2405.07033 [pdf, ps, other]
-
Title: A Performance Analysis Modeling Framework for Extended Reality Applications in Edge-Assisted Wireless NetworksComments: 12 pages, 4 figures; To appear in Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS), 2024Subjects: Networking and Internet Architecture (cs.NI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Image and Video Processing (eess.IV)
Extended reality (XR) is at the center of attraction in the research community due to the emergence of augmented, mixed, and virtual reality applications. The performance of such applications needs to be uptight to maintain the requirements of latency, energy consumption, and freshness of data. Therefore, a comprehensive performance analysis model is required to assess the effectiveness of an XR application but is challenging to design due to the dependence of the performance metrics on several difficult-to-model parameters, such as computing resources and hardware utilization of XR and edge devices, which are controlled by both their operating systems and the application itself. Moreover, the heterogeneity in devices and wireless access networks brings additional challenges in modeling. In this paper, we propose a novel modeling framework for performance analysis of XR applications considering edge-assisted wireless networks and validate the model with experimental data collected from testbeds designed specifically for XR applications. In addition, we present the challenges associated with performance analysis modeling and present methods to overcome them in detail. Finally, the performance evaluation shows that the proposed analytical model can analyze XR applications' performance with high accuracy compared to the state-of-the-art analytical models.
- [193] arXiv:2405.07034 [pdf, ps, other]
-
Title: Towards an Accessible and Rapidly Trainable Rhythm Sequencer Using a Generative Stacked AutoencoderComments: 7 pages, 7 figuresSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Neural networks and deep learning are often deployed for the sake of the most comprehensive music generation with as little involvement as possible from the human musician. Implementations in aid of, or being a tool for, music practitioners are sparse. This paper proposes the integration of generative stacked autoencoder structures for rhythm generation, within a conventional melodic step-sequencer. It further aims to work towards its implementation being accessible to the average electronic music practitioner. Several model architectures have been trained and tested for their creative potential. While the currently implementations do display limitations, they do represent viable creative solutions for music practitioners.
- [194] arXiv:2405.07035 [pdf, ps, other]
-
Title: A Turkish Educational Crossword PuzzleComments: This paper has been accepted for presentation at AIED2024 LBRSubjects: Computation and Language (cs.CL)
This paper introduces the first Turkish crossword puzzle generator designed to leverage the capabilities of large language models (LLMs) for educational purposes. In this work, we introduced two specially created datasets: one with over 180,000 unique answer-clue pairs for generating relevant clues from the given answer, and another with over 35,000 samples containing text, answer, category, and clue data, aimed at producing clues for specific texts and keywords within certain categories. Beyond entertainment, this generator emerges as an interactive educational tool that enhances memory, vocabulary, and problem-solving skills. It's a notable step in AI-enhanced education, merging game-like engagement with learning for Turkish and setting new standards for interactive, intelligent learning tools in Turkish.
- [195] arXiv:2405.07037 [pdf, ps, other]
-
Title: Robust Online Convex Optimization for Disturbance RejectionSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Online convex optimization (OCO) is a powerful tool for learning sequential data, making it ideal for high precision control applications where the disturbances are arbitrary and unknown in advance. However, the ability of OCO-based controllers to accurately learn the disturbance while maintaining closed-loop stability relies on having an accurate model of the plant. This paper studies the performance of OCO-based controllers for linear time-invariant (LTI) systems subject to disturbance and model uncertainty. The model uncertainty can cause the closed-loop to become unstable. We provide a sufficient condition for robust stability based on the small gain theorem. This condition is easily incorporated as an on-line constraint in the OCO controller. Finally, we verify via numerical simulations that imposing the robust stability condition on the OCO controller ensures closed-loop stability.
- [196] arXiv:2405.07038 [pdf, ps, other]
-
Title: Conformal Online Auction DesignSubjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper proposes the conformal online auction design (COAD), a novel mechanism for maximizing revenue in online auctions by quantifying the uncertainty in bidders' values without relying on assumptions about value distributions. COAD incorporates both the bidder and item features and leverages historical data to provide an incentive-compatible mechanism for online auctions. Unlike traditional methods for online auctions, COAD employs a distribution-free, prediction interval-based approach using conformal prediction techniques. This novel approach ensures that the expected revenue from our mechanism can achieve at least a constant fraction of the revenue generated by the optimal mechanism. Additionally, COAD admits the use of a broad array of modern machine-learning methods, including random forests, kernel methods, and deep neural nets, for predicting bidders' values. It ensures revenue performance under any finite sample of historical data. Moreover, COAD introduces bidder-specific reserve prices based on the lower confidence bounds of bidders' valuations, which is different from the uniform reserve prices commonly used in the literature. We validate our theoretical predictions through extensive simulations and a real-data application. All code for using COAD and reproducing results is made available on GitHub.
- [197] arXiv:2405.07040 [pdf, ps, other]
-
Title: Low-Complexity OTFS-Based Over-the-Air Computation Design for Time-Varying ChannelsComments: 14 pages, 11 figures, submitted to IEEE for possible publication. arXiv admin note: text overlap with arXiv:2403.11272Subjects: Information Theory (cs.IT)
This paper investigates over-the-air computation (AirComp) over multiple-access time-varying channels, where devices with high mobility transmit their sensing data to a fusion center (FC) for averaging. To combat the Doppler shift induced by time-varying channels, each device adopts orthogonal time frequency space (OTFS) modulation. Our objective is minimizing the mean squared error (MSE) for the target function estimation. Due to the multipath time-varying channels, the OTFS-based AirComp not only suffers from noise but also interference. Specifically, we propose three schemes, namely S1, S2, and S3, for the target function estimation. S1 directly estimates the target function under the impacts of noise and interference. S2 mitigates the interference by introducing a zero padding-assisted OTFS. In S3, we propose an iterative algorithm to estimate the function in a matrix form. In the numerical results, we evaluate the performance of S1, S2, and S3 from the perspectives of MSE and computational complexity, and compare them with benchmarks. Specifically, compared to benchmarks, S3 outperforms them with a significantly lower MSE but incurs a higher computational complexity. In contrast, S2 demonstrates a reduction in both MSE and computational complexity. Lastly, S1 shows superior error performance at small SNR and reduced computational complexity.
- [198] arXiv:2405.07041 [pdf, ps, other]
-
Title: Multi-agent Traffic Prediction via Denoised Endpoint DistributionSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
The exploration of high-speed movement by robots or road traffic agents is crucial for autonomous driving and navigation. Trajectory prediction at high speeds requires considering historical features and interactions with surrounding entities, a complexity not as pronounced in lower-speed environments. Prior methods have assessed the spatio-temporal dynamics of agents but often neglected intrinsic intent and uncertainty, thereby limiting their effectiveness. We present the Denoised Endpoint Distribution model for trajectory prediction, which distinctively models agents' spatio-temporal features alongside their intrinsic intentions and uncertainties. By employing Diffusion and Transformer models to focus on agent endpoints rather than entire trajectories, our approach significantly reduces model complexity and enhances performance through endpoint information. Our experiments on open datasets, coupled with comparison and ablation studies, demonstrate our model's efficacy and the importance of its components. This approach advances trajectory prediction in high-speed scenarios and lays groundwork for future developments.
- [199] arXiv:2405.07043 [pdf, ps, other]
-
Title: Optimal Multilayered Motion Planning for Multiple Differential Drive Mobile Robots with Hierarchical Prioritization (OM-MP)Subjects: Robotics (cs.RO)
We present a novel framework for addressing the challenges of multi-Agent planning and formation control within intricate and dynamic environments. This framework transforms the Multi-Agent Path Finding (MAPF) problem into a Multi-Agent Trajectory Planning (MATP) problem. Unlike traditional MAPF solutions, our multilayer optimization scheme consists of a global planner optimization solver, which is dedicated to determining concise global paths for each individual robot, and a local planner with an embedded optimization solver aimed at ensuring the feasibility of local robot trajectories. By implementing a hierarchical prioritization strategy, we enhance robots' efficiency and approximate the global optimal solution. Specifically, within the global planner, we employ the Augmented Graph Search (AGS) algorithm, which significantly improves the speed of solutions. Meanwhile, within the local planner optimization solver, we utilize Control Barrier functions (CBFs) and introduced an oblique cylindrical obstacle bounding box based on the time axis for obstacle avoidance and construct a single-robot locally aware-communication circle to ensure the simplicity, speed, and accuracy of locally optimized solutions. Additionally, we integrate the weight and priority of path traces to prevent deadlocks in limiting scenarios. Compared to the other state-of-the-art methods, including CBS, ECBS and other derivative algorithms, our proposed method demonstrates superior performance in terms of capacity, flexible scalability and overall task optimality in theory, as validated through simulations and experiments.
- [200] arXiv:2405.07044 [pdf, ps, other]
-
Title: Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion PriorSubjects: Computer Vision and Pattern Recognition (cs.CV)
Remote sensing images captured by different platforms exhibit significant disparities in spatial resolution. Large scale factor super-resolution (SR) algorithms are vital for maximizing the utilization of low-resolution (LR) satellite data captured from orbit. However, existing methods confront challenges in recovering SR images with clear textures and correct ground objects. We introduce a novel framework, the Semantic Guided Diffusion Model (SGDM), designed for large scale factor remote sensing image super-resolution. The framework exploits a pre-trained generative model as a prior to generate perceptually plausible SR images. We further enhance the reconstruction by incorporating vector maps, which carry structural and semantic cues. Moreover, pixel-level inconsistencies in paired remote sensing images, stemming from sensor-specific imaging characteristics, may hinder the convergence of the model and diversity in generated results. To address this problem, we propose to extract the sensor-specific imaging characteristics and model the distribution of them, allowing diverse SR images generation based on imaging characteristics provided by reference images or sampled from the imaging characteristic probability distributions. To validate and evaluate our approach, we create the Cross-Modal Super-Resolution Dataset (CMSRD). Qualitative and quantitative experiments on CMSRD showcase the superiority and broad applicability of our method. Experimental results on downstream vision tasks also demonstrate the utilitarian of the generated SR images. The dataset and code will be publicly available at this https URL
- [201] arXiv:2405.07045 [pdf, ps, other]
-
Title: Predictive Modeling in the Reservoir Kernel Motif SpaceComments: 8 pagesSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This work proposes a time series prediction method based on the kernel view of linear reservoirs. In particular, the time series motifs of the reservoir kernel are used as representational basis on which general readouts are constructed. We provide a geometric interpretation of our approach shedding light on how our approach is related to the core reservoir models and in what way the two approaches differ. Empirical experiments then compare predictive performances of our suggested model with those of recent state-of-art transformer based models, as well as the established recurrent network model - LSTM. The experiments are performed on both univariate and multivariate time series and with a variety of prediction horizons. Rather surprisingly we show that even when linear readout is employed, our method has the capacity to outperform transformer models on univariate time series and attain competitive results on multivariate benchmark datasets. We conclude that simple models with easily controllable capacity but capturing enough memory and subsequence structure can outperform potentially over-complicated deep learning models. This does not mean that reservoir motif based models are preferable to other more complex alternatives - rather, when introducing a new complex time series model one should employ as a sanity check simple, but potentially powerful alternatives/baselines such as reservoir models or the models introduced here.
- [202] arXiv:2405.07046 [pdf, ps, other]
-
Title: Retrieval Enhanced Zero-Shot Video CaptioningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Despite the significant progress of fully-supervised video captioning, zero-shot methods remain much less explored. In this paper, we propose to take advantage of existing pre-trained large-scale vision and language models to directly generate captions with test time adaptation. Specifically, we bridge video and text using three key models: a general video understanding model XCLIP, a general image understanding model CLIP, and a text generation model GPT-2, due to their source-code availability. The main challenge is how to enable the text generation model to be sufficiently aware of the content in a given video so as to generate corresponding captions. To address this problem, we propose using learnable tokens as a communication medium between frozen GPT-2 and frozen XCLIP as well as frozen CLIP. Differing from the conventional way to train these tokens with training data, we update these tokens with pseudo-targets of the inference data under several carefully crafted loss functions which enable the tokens to absorb video information catered for GPT-2. This procedure can be done in just a few iterations (we use 16 iterations in the experiments) and does not require ground truth data. Extensive experimental results on three widely used datasets, MSR-VTT, MSVD, and VATEX, show 4% to 20% improvements in terms of the main metric CIDEr compared to the existing state-of-the-art methods.
- [203] arXiv:2405.07047 [pdf, ps, other]
-
Title: Unsupervised Density Neural Representation for CT Metal Artifact ReductionQing Wu, Xu Guo, Lixuan Chen, Dongming He, Hongjiang Wei, Xudong Wang, S. Kevin Zhou, Yifeng Zhang, Jingyi Yu, Yuyao ZhangComments: 13 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Emerging unsupervised reconstruction techniques based on implicit neural representation (INR), such as NeRP, CoIL, and SCOPE, have shown unique capabilities in CT linear inverse imaging. In this work, we propose a novel unsupervised density neural representation (Diner) to tackle the challenging problem of CT metal artifacts when scanned objects contain metals. The drastic variation of linear attenuation coefficients (LACs) of metals over X-ray spectra leads to a nonlinear beam hardening effect (BHE) in CT measurements. Recovering CT images from metal-affected measurements therefore poses a complicated nonlinear inverse problem. Existing metal artifact reduction (MAR) techniques mostly formulate the MAR as an image inpainting task, which ignores the energy-induced BHE and produces suboptimal performance. Instead, our Diner introduces an energy-dependent polychromatic CT forward model to the INR framework, addressing the nonlinear nature of the MAR problem. Specifically, we decompose the energy-dependent LACs into energy-independent densities and energy-dependent mass attenuation coefficients (MACs) by fully considering the physical model of X-ray absorption. Using the densities as pivot variables and the MACs as known prior knowledge, the LACs can be accurately reconstructed from the raw measurements. Technically, we represent the unknown density map as an implicit function of coordinates. Combined with a novel differentiable forward model simulating the physical acquisition from the densities to the measurements, our Diner optimizes a multi-layer perception network to approximate the implicit function by minimizing predicted errors between the estimated and real measurements. Experimental results on simulated and real datasets confirm the superiority of our unsupervised Diner against popular supervised techniques in MAR performance and robustness.
- [204] arXiv:2405.07052 [pdf, ps, other]
-
Title: Length-Aware Multi-Kernel Transformer for Long Document ClassificationComments: Accepted to SEM 2024Subjects: Computation and Language (cs.CL)
Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption. While existing state-of-the-art (SOTA) models segment long texts into equal-length snippets (e.g., 128 tokens per snippet) or deploy sparse attention networks, these methods have new challenges of context fragmentation and generalizability due to sentence boundaries and varying text lengths. For example, our empirical analysis has shown that SOTA models consistently overfit one set of lengthy documents (e.g., 2000 tokens) while performing worse on texts with other lengths (e.g., 1000 or 4000). In this study, we propose a Length-Aware Multi-Kernel Transformer (LAMKIT) to address the new challenges for the long document classification. LAMKIT encodes lengthy documents by diverse transformer-based kernels for bridging context boundaries and vectorizes text length by the kernels to promote model robustness over varying document lengths. Experiments on five standard benchmarks from health and law domains show LAMKIT outperforms SOTA models up to an absolute 10.9% improvement. We conduct extensive ablation analyses to examine model robustness and effectiveness over varying document lengths.
- [205] arXiv:2405.07054 [pdf, ps, other]
-
Title: LUCID: A Framework for Reducing False Positives and Inconsistencies Among Container Scanning ToolsComments: 13 pages, 15 figures, 8 tablesSubjects: Cryptography and Security (cs.CR)
Containerization has emerged as a revolutionary technology in the software development and deployment industry. Containers offer a portable and lightweight solution that allows for packaging applications and their dependencies systematically and efficiently. In addition, containers offer faster deployment and near-native performance with isolation and security drawbacks compared to Virtual Machines. To address the security issues, scanning tools that scan containers for preexisting vulnerabilities have been developed, but they suffer from false positives. Moreover, using different scanning tools to scan the same container provides different results, which leads to inconsistencies and confusion. Limited work has been done to address these issues. This paper provides a fully functional and extensible framework named LUCID that can reduce false positives and inconsistencies provided by multiple scanning tools. We use a database-centric approach and perform query-based analysis, to pinpoint the causes for inconsistencies. Our results show that our framework can reduce inconsistencies by 70%. The framework has been tested on both Intel64/AMD64 and ARM architecture. We also create a Dynamic Classification component that can successfully classify and predict the different severity levels with an accuracy of 84%. We believe this paper will raise awareness regarding security in container technologies and enable container scanning companies to improve their tool to provide better and more consistent results.
- [206] arXiv:2405.07056 [pdf, ps, other]
-
Title: Graph $p$-Laplacian eigenpairs as saddle points of a family of spectral energy functionsSubjects: Numerical Analysis (math.NA)
We address the problem of computing the graph $p$-Laplacian eigenpairs for $p\in (2,\infty)$. We propose a reformulation of the graph $p$-Laplacian eigenvalue problem in terms of a constrained weighted Laplacian eigenvalue problem and discuss theoretical and computational advantages. We provide a correspondence between $p$-Laplacian eigenpairs and linear eigenpair of a constrained generalized weighted Laplacian eigenvalue problem. As a result, we can assign an index to any $p$-Laplacian eigenpair that matches the Morse index of the $p$-Rayleigh quotient evaluated at the eigenfunction. In the second part of the paper we introduce a class of spectral energy functions that depend on edge and node weights. We prove that differentiable saddle points of the $k$-th energy function correspond to $p$-Laplacian eigenpairs having index equal to $k$. Moreover, the first energy function is proved to possess a unique saddle point which corresponds to the unique first $p$-Laplacian eigenpair. Finally we develop novel gradient-based numerical methods suited to compute $p$-Laplacian eigenpairs for any $p\in(2,\infty)$ and present some experiments.
- [207] arXiv:2405.07059 [pdf, ps, other]
-
Title: Numerical Analysis of Finite Dimensional Approximations in Finite Temperature DFTComments: 20 pages, 6 figuresSubjects: Numerical Analysis (math.NA)
In this paper, we study numerical approximations of the ground states in finite temperature density functional theory. We formulate the problem with respect to the density matrices and justify the convergence of the finite dimensional approximations. Moreover, we provide an optimal a priori error estimate under some mild assumptions and present some numerical experiments to support the theory.
- [208] arXiv:2405.07060 [pdf, ps, other]
-
Title: Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind PeopleSubjects: Robotics (cs.RO)
Visual Language Navigation (VLN) powered navigation robots have the potential to guide blind people by understanding and executing route instructions provided by sighted passersby. This capability allows robots to operate in environments that are often unknown a priori. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contain stutters, errors, and omission of details as opposed to those obtained by thinking out loud, such as in the Room-to-Room dataset. However, currently, there is no benchmark that simulates instructions that were obtained from human memory in environments where blind people navigate. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. To collect natural language instructions, we conducted two studies from sighted passersby onsite and annotators online. Our analysis demonstrates that instructions data collected onsite were more lengthy and contained more varied wording. Alongside our benchmark, we propose a VLN model better equipped to handle the scenario. Our proposed VLN model uses Large Language Models (LLM) to parse instructions and generate Python codes for robot control. We further show that the existing state-of-the-art model performed suboptimally on our benchmark. In contrast, our proposed method outperformed the state-of-the-art model by a fair margin. We found that future research should exercise caution when considering VLN technology for practical applications, as real-world scenarios have different characteristics than ones collected in traditional settings.
- [209] arXiv:2405.07061 [pdf, ps, other]
-
Title: LLMs and the Future of Chip Design: Unveiling Security Risks and Building TrustSubjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Cryptography and Security (cs.CR)
Chip design is about to be revolutionized by the integration of large language, multimodal, and circuit models (collectively LxMs). While exploring this exciting frontier with tremendous potential, the community must also carefully consider the related security risks and the need for building trust into using LxMs for chip design. First, we review the recent surge of using LxMs for chip design in general. We cover state-of-the-art works for the automation of hardware description language code generation and for scripting and guidance of essential but cumbersome tasks for electronic design automation tools, e.g., design-space exploration, tuning, or designer training. Second, we raise and provide initial answers to novel research questions on critical issues for security and trustworthiness of LxM-powered chip design from both the attack and defense perspectives.
- [210] arXiv:2405.07065 [pdf, ps, other]
-
Title: LogoMotion: Visually Grounded Code Generation for Content-Aware AnimationVivian Liu, Rubaiat Habib Kazi, Li-Yi Wei, Matthew Fisher, Timothy Langlois, Seth Walker, Lydia ChiltonSubjects: Human-Computer Interaction (cs.HC)
Animated logos are a compelling and ubiquitous way individuals and brands represent themselves online. Manually authoring these logos can require significant artistic skill and effort. To help novice designers animate logos, design tools currently offer templates and animation presets. However, these solutions can be limited in their expressive range. Large language models have the potential to help novice designers create animated logos by generating animation code that is tailored to their content. In this paper, we introduce LogoMotion, an LLM-based system that takes in a layered document and generates animated logos through visually-grounded program synthesis. We introduce techniques to create an HTML representation of a canvas, identify primary and secondary elements, synthesize animation code, and visually debug animation errors. When compared with an industry standard tool, we find that LogoMotion produces animations that are more content-aware and are on par in terms of quality. We conclude with a discussion of the implications of LLM-generated animation for motion design.
- [211] arXiv:2405.07067 [pdf, ps, other]
-
Title: Learning Flame Evolution Operator under Hybrid Darrieus Landau and Diffusive Thermal InstabilityComments: 25 page, 10 figuresSubjects: Machine Learning (cs.LG)
Recent advancements in the integration of artificial intelligence (AI) and machine learning (ML) with physical sciences have led to significant progress in addressing complex phenomena governed by nonlinear partial differential equations (PDE). This paper explores the application of novel operator learning methodologies to unravel the intricate dynamics of flame instability, particularly focusing on hybrid instabilities arising from the coexistence of Darrieus-Landau (DL) and Diffusive-Thermal (DT) mechanisms. Training datasets encompass a wide range of parameter configurations, enabling the learning of parametric solution advancement operators using techniques such as parametric Fourier Neural Operator (pFNO), and parametric convolutional neural networks (pCNN). Results demonstrate the efficacy of these methods in accurately predicting short-term and long-term flame evolution across diverse parameter regimes, capturing the characteristic behaviors of pure and blended instabilities. Comparative analyses reveal pFNO as the most accurate model for learning short-term solutions, while all models exhibit robust performance in capturing the nuanced dynamics of flame evolution. This research contributes to the development of robust modeling frameworks for understanding and controlling complex physical processes governed by nonlinear PDE.
- [212] arXiv:2405.07070 [pdf, ps, other]
-
Title: Decoding Cognitive Health Using Machine Learning: A Comprehensive Evaluation for Diagnosis of Significant Memory ConcernJournal-ref: WIREs Data Mining and Knowledge Discovery, 2024Subjects: Machine Learning (cs.LG)
The timely identification of significant memory concern (SMC) is crucial for proactive cognitive health management, especially in an aging population. Detecting SMC early enables timely intervention and personalized care, potentially slowing cognitive disorder progression. This study presents a state-of-the-art review followed by a comprehensive evaluation of machine learning models within the randomized neural networks (RNNs) and hyperplane-based classifiers (HbCs) family to investigate SMC diagnosis thoroughly. Utilizing the Alzheimer's Disease Neuroimaging Initiative 2 (ADNI2) dataset, 111 individuals with SMC and 111 healthy older adults are analyzed based on T1W magnetic resonance imaging (MRI) scans, extracting rich features. This analysis is based on baseline structural MRI (sMRI) scans, extracting rich features from gray matter (GM), white matter (WM), Jacobian determinant (JD), and cortical thickness (CT) measurements. In RNNs, deep random vector functional link (dRVFL) and ensemble dRVFL (edRVFL) emerge as the best classifiers in terms of performance metrics in the identification of SMC. In HbCs, Kernelized pinball general twin support vector machine (Pin-GTSVM-K) excels in CT and WM features, whereas Linear Pin-GTSVM (Pin-GTSVM-L) and Linear intuitionistic fuzzy TSVM (IFTSVM-L) performs well in the JD and GM features sets, respectively. This comprehensive evaluation emphasizes the critical role of feature selection and model choice in attaining an effective classifier for SMC diagnosis. The inclusion of statistical analyses further reinforces the credibility of the results, affirming the rigor of this analysis. The performance measures exhibit the suitability of this framework in aiding researchers with the automated and accurate assessment of SMC. The source codes of the algorithms and datasets used in this study are available at this https URL.
- [213] arXiv:2405.07072 [pdf, ps, html, other]
-
Title: Selecting focused digital cohorts from social media using the metric backbone of biomedical knowledge graphsSubjects: Social and Information Networks (cs.SI)
The abundance of social media data allows researchers to construct large digital cohorts to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics, Reddit subgroups and dedicated patient advocacy forums trade in much more specific, biomedically-relevant discourse. To hone in on relevant users anywhere, we have developed a general framework and applied it to epilepsy discourse in social media as a test case. We analyzed the text from posts by users who mention epilepsy drugs in the general-purpose social media sites X and Instagram, the epilepsy-focused Reddit subgroup (r/Epilepsy), and the Epilepsy Foundation of America (EFA) forums. We curated a medical terms dictionary and used it to generate a knowledge graph (KG) for each online community. For each KG, we computed the metric backbone--the smallest subgraph that preserves all shortest paths in the network. By comparing the subset of users who contribute to the backbone to the subset who do not, we found that epilepsy-focused social media users contribute to the KG backbone in much higher proportion than do general-purpose social media users. Furthermore, using human annotation of Instagram posts, we demonstrated that users who do not contribute to the backbone are more than twice as likely to use dictionary terms in a manner inconsistent with their biomedical meaning. For biomedical research applications, our backbone-based approach thus has several benefits over simple engagement-based approaches: It can retain low-engagement users who nonetheless contribute meaningful biomedical insights. It can filter out very vocal users who contribute no relevant content.
- [214] arXiv:2405.07076 [pdf, ps, other]
-
Title: Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language ModelsComments: 26 pages, 7 tables, 6 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This research develops advanced methodologies for Large Language Models (LLMs) to better manage linguistic behaviors related to emotions and ethics. We introduce DIKE, an adversarial framework that enhances the LLMs' ability to internalize and reflect global human values, adapting to varied cultural contexts to promote transparency and trust among users. The methodology involves detailed modeling of emotions, classification of linguistic behaviors, and implementation of ethical guardrails. Our innovative approaches include mapping emotions and behaviors using self-supervised learning techniques, refining these guardrails through adversarial reviews, and systematically adjusting outputs to ensure ethical alignment. This framework establishes a robust foundation for AI systems to operate with ethical integrity and cultural sensitivity, paving the way for more responsible and context-aware AI interactions.
- [215] arXiv:2405.07079 [pdf, ps, other]
-
Title: Host-Based Allocators for Device MemoryComments: 9 pages, 4 figuresSubjects: Software Engineering (cs.SE)
Memory allocation is a fairly mature field of computer science. However, we challenge a prevailing assumption in the literature over the last 50 years which, if reconsidered, necessitates a fundamental reevaluation of many classical memory management algorithms. We pose a model where the allocation algorithm runs on host memory but allocates device memory and so incur the following constraint: the allocator can't read the memory it is allocating.
This means we are unable to use boundary tags, which is a concept that has been ubiquitous in nearly every allocation algorithm. In this paper, we propose alternate algorithms to work around this constraint, and discuss in general the implications of this system model. - [216] arXiv:2405.07081 [pdf, ps, html, other]
-
Title: T-curator: a trust based curation tool for LOD logsSubjects: Databases (cs.DB); Computation and Language (cs.CL)
Nowadays, companies are racing towards Linked Open Data (LOD) to improve their added value, but they are ignoring their SPARQL query logs. If well curated, these logs can present an asset for decision makers. A naive and straightforward use of these logs is too risky because their provenance and quality are highly questionable. Users of these logs in a trusted way have to be assisted by providing them with in-depth knowledge of the whole LOD environment and tools to curate these logs. In this paper, we propose an interactive and intuitive trust based tool that can be used to curate these LOD logs before exploiting them. This tool is proposed to support our approach proposed in our previous work Lanasri et al. [2020].
- [217] arXiv:2405.07083 [pdf, ps, other]
-
Title: Data-Efficient and Robust Task Selection for Meta-LearningComments: Accepted by CVPR 2024 WrokshopSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different training stages and in whether they contain noisy labeled data or not, making a uniform approach suboptimal. To address these issues, we propose the Data-Efficient and Robust Task Selection (DERTS) algorithm, which can be incorporated into both gradient and metric-based meta-learning algorithms. DERTS selects weighted subsets of tasks from task pools by minimizing the approximation error of the full gradient of task pools in the meta-training stage. The selected tasks are efficient for rapid training and robust towards noisy label scenarios. Unlike existing algorithms, DERTS does not require any architecture modification for training and can handle noisy label data in both the support and query sets. Analysis of DERTS shows that the algorithm follows similar training dynamics as learning on the full task pools. Experiments show that DERTS outperforms existing sampling strategies for meta-learning on both gradient-based and metric-based meta-learning algorithms in limited data budget and noisy task settings.
- [218] arXiv:2405.07086 [pdf, ps, other]
-
Title: An enhanced basis for producing Bezier-like curvesSubjects: Numerical Analysis (math.NA)
This study aims on proposing a new structure for constructing Bernstein-like bases. The structure uses an auxiliary function and a shape parameter to construct a new family of bases from any family of blending functions. The new family of bases inherit almost all algebraic and geometric properties of the initial blending functions. The corresponding curves have the freedom to travel from the curve constructed from the initial blending functions to the line segment joining the first and last control points. The new bases have the monotonicity preservation property and the shape of the curve could be adjusted by changing the parameter.
- [219] arXiv:2405.07087 [pdf, ps, other]
-
Title: Auditing an Automatic Grading Model with deep Reinforcement LearningSubjects: Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
We explore the use of deep reinforcement learning to audit an automatic short answer grading (ASAG) model. Automatic grading may decrease the time burden of rating open-ended items for educators, but a lack of robust evaluation methods for these models can result in uncertainty of their quality. Current state-of-the-art ASAG models are configured to match human ratings from a training set, and researchers typically assess their quality with accuracy metrics that signify agreement between model and human scores. In this paper, we show that a high level of agreement to human ratings does not give sufficient evidence that an ASAG model is infallible. We train a reinforcement learning agent to revise student responses with the objective of achieving a high rating from an automatic grading model in the least number of revisions. By analyzing the agent's revised responses that achieve a high grade from the ASAG model but would not be considered a high scoring responses according to a scoring rubric, we discover ways in which the automated grader can be exploited, exposing shortcomings in the grading model.
- [220] arXiv:2405.07088 [pdf, ps, other]
-
Title: Towards Context-Aware Modeling of Situation Awareness in Conditionally Automated DrivingComments: 37 Pages, 8 figuresSubjects: Human-Computer Interaction (cs.HC)
Maintaining adequate situation awareness (SA) is crucial for the safe operation of conditionally automated vehicles (AVs), which requires drivers to regain control during takeover (TOR) events. This study developed a predictive model for real-time assessment of driver SA using multimodal data (e.g., galvanic skin response, heart rate and eye tracking data, and driver characteristics) collected in a simulated driving environment. Sixty-seven participants experienced automated driving scenarios with TORs, with conditions varying in risk perception and the presence of automation errors. A LightGBM (Light Gradient Boosting Machine) model trained on the top 12 predictors identified by SHAP (SHapley Additive exPlanations) achieved promising performance with RMSE=0.89, MAE=0.71, and Corr=0.78. These findings have implications towards context-aware modeling of SA in conditionally automated driving, paving the way for safer and more seamless driver-AV interactions.
- [221] arXiv:2405.07089 [pdf, ps, html, other]
-
Title: SonifyAR: Context-Aware Sound Generation in Augmented RealityComments: 12 pages, 12 figuresSubjects: Human-Computer Interaction (cs.HC)
Sound plays a crucial role in enhancing user experience and immersiveness in Augmented Reality (AR). However, current platforms lack support for AR sound authoring due to limited interaction types, challenges in collecting and specifying context information, and difficulty in acquiring matching sound assets. We present SonifyAR, an LLM-based AR sound authoring system that generates context-aware sound effects for AR experiences. SonifyAR expands the current design space of AR sound and implements a Programming by Demonstration (PbD) pipeline to automatically collect contextual information of AR events, including virtual content semantics and real world context. This context information is then processed by a large language model to acquire sound effects with Recommendation, Retrieval, Generation, and Transfer methods. To evaluate the usability and performance of our system, we conducted a user study with eight participants and created five example applications, including an AR-based science experiment, an improving case for AR headset safety, and an assisting example for low vision AR users.
- [222] arXiv:2405.07090 [pdf, ps, other]
-
Title: MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI ModelingSubjects: Human-Computer Interaction (cs.HC)
The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element detection and UI retrieval, showcasing its potential to establish a foundation for future research into high-quality, modern UIs.
- [223] arXiv:2405.07094 [pdf, ps, other]
-
Title: The Road to Compliance: Executive Federal Agencies and the NIST Risk Management FrameworkComments: This research paper was showcased at the University of West Florida Student Scholars Symposium and Faculty Research Showcase on April 18, 2024. It is supported by the National Science Foundation (NSF) under Grant No. 1946442. The views, findings, and conclusions presented are solely those of the author(s) and do not necessarily represent the views of the NSFSubjects: Cryptography and Security (cs.CR)
This informative report provides a comprehensive analysis of how executive federal report agencies implement the National Institute of Standards and Technology's (NIST) Risk Management Framework (RMF) to achieve cybersecurity compliance. By exploring the concept and evolution of the RMF, the report delves into the framework's importance for enhancing cybersecurity measures within federal agencies, addressing the challenges these agencies face in the digital landscape. Through a methodical literature review, the report examines theoretical foundations, implementation strategies, and the critical role of continuous monitoring and automation in RMF processes, drawing from key sources like Ross (2014), Lubell (2020), Barrett et al. (2021), and Pillitteri et al. (2021, 2022), among others. Employing a detailed methodology for data collection and analysis, the report presents findings on the successes and challenges of RMF implementation, highlighting the impact of automation and continuous monitoring in bolstering cybersecurity postures. Case studies offer in-depth insights into the experiences of specific agencies, providing lessons learned and best practices. The report concludes with strategic recommendations for overcoming implementation challenges and suggests future directions for enhancing RMF research and practice. This investigation underscores the RMF's critical role in establishing robust cybersecurity compliance across executive federal agencies, offering valuable recommendations for policymakers, cybersecurity professionals, and governmental bodies.
- [224] arXiv:2405.07096 [pdf, ps, html, other]
-
Title: Multi-Relational Structural EntropyComments: Accepted to UAI 2024Subjects: Social and Information Networks (cs.SI); Information Theory (cs.IT)
Structural Entropy (SE) measures the structural information contained in a graph. Minimizing or maximizing SE helps to reveal or obscure the intrinsic structural patterns underlying graphs in an interpretable manner, finding applications in various tasks driven by networked data. However, SE ignores the heterogeneity inherent in the graph relations, which is ubiquitous in modern networks. In this work, we extend SE to consider heterogeneous relations and propose the first metric for multi-relational graph structural information, namely, Multi-relational Structural Entropy (MrSE). To this end, we first cast SE through the novel lens of the stationary distribution from random surfing, which readily extends to multi-relational networks by considering the choices of both nodes and relation types simultaneously at each step. The resulting MrSE is then optimized by a new greedy algorithm to reveal the essential structures within a multi-relational network. Experimental results highlight that the proposed MrSE offers a more insightful interpretation of the structure of multi-relational graphs compared to SE. Additionally, it enhances the performance of two tasks that involve real-world multi-relational graphs, including node clustering and social event detection.
- [225] arXiv:2405.07097 [pdf, ps, other]
-
Title: Diffusion models as probabilistic neural operators for recovering unobserved states of dynamical systemsComments: Preprint submitted to IEEE MLSP 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper explores the efficacy of diffusion-based generative models as neural operators for partial differential equations (PDEs). Neural operators are neural networks that learn a mapping from the parameter space to the solution space of PDEs from data, and they can also solve the inverse problem of estimating the parameter from the solution. Diffusion models excel in many domains, but their potential as neural operators has not been thoroughly explored. In this work, we show that diffusion-based generative models exhibit many properties favourable for neural operators, and they can effectively generate the solution of a PDE conditionally on the parameter or recover the unobserved parts of the system. We propose to train a single model adaptable to multiple tasks, by alternating between the tasks during training. In our experiments with multiple realistic dynamical systems, diffusion models outperform other neural operators. Furthermore, we demonstrate how the probabilistic diffusion model can elegantly deal with systems which are only partially identifiable, by producing samples corresponding to the different possible solutions.
- [226] arXiv:2405.07098 [pdf, ps, other]
-
Title: Interpretable global minima of deep ReLU neural networks on sequentially separable dataComments: AMS Latex, 22 pages, 3 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Mathematical Physics (math-ph); Optimization and Control (math.OC); Machine Learning (stat.ML)
We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.
- [227] arXiv:2405.07099 [pdf, ps, other]
-
Title: Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?Journal-ref: In Proceedings of EACL 2023, 849-864 (2023)Subjects: Computation and Language (cs.CL)
Semitic morphologically-rich languages (MRLs) are characterized by extreme word ambiguity. Because most vowels are omitted in standard texts, many of the words are homographs with multiple possible analyses, each with a different pronunciation and different morphosyntactic properties. This ambiguity goes beyond word-sense disambiguation (WSD), and may include token segmentation into multiple word units. Previous research on MRLs claimed that standardly trained pre-trained language models (PLMs) based on word-pieces may not sufficiently capture the internal structure of such tokens in order to distinguish between these analyses. Taking Hebrew as a case study, we investigate the extent to which Hebrew homographs can be disambiguated and analyzed using PLMs. We evaluate all existing models for contextualized Hebrew embeddings on a novel Hebrew homograph challenge sets that we deliver. Our empirical results demonstrate that contemporary Hebrew contextualized embeddings outperform non-contextualized embeddings; and that they are most effective for disambiguating segmentation and morphosyntactic features, less so regarding pure word-sense disambiguation. We show that these embeddings are more effective when the number of word-piece splits is limited, and they are more effective for 2-way and 3-way ambiguities than for 4-way ambiguity. We show that the embeddings are equally effective for homographs of both balanced and skewed distributions, whether calculated as masked or unmasked tokens. Finally, we show that these embeddings are as effective for homograph disambiguation with extensive supervised training as with a few-shot setup.
- [228] arXiv:2405.07101 [pdf, ps, other]
-
Title: Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITASubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In the pursuit of advancing natural language processing for the Italian language, we introduce a state-of-the-art Large Language Model (LLM) based on the novel Meta LLaMA-3 model: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA. We fine-tuned the original 8B parameters instruction tuned model using the Supervised Fine-tuning (SFT) technique on the English and Italian language datasets in order to improve the original performance. Consequently, a Dynamic Preference Optimization (DPO) process has been used to align preferences, avoid dangerous and inappropriate answers, and limit biases and prejudices. Our model leverages the efficiency of QLoRA to fine-tune the model on a smaller portion of the original model weights and then adapt the model specifically for the Italian linguistic structure, achieving significant improvements in both performance and computational efficiency. Concurrently, DPO is employed to refine the model's output, ensuring that generated content aligns with quality answers. The synergy between SFT, QLoRA's parameter efficiency and DPO's user-centric optimization results in a robust LLM that excels in a variety of tasks, including but not limited to text completion, zero-shot classification, and contextual understanding. The model has been extensively evaluated over standard benchmarks for the Italian and English languages, showing outstanding results. The model is freely available over the HuggingFace hub and, examples of use can be found in our GitHub repository. this https URL
- [229] arXiv:2405.07104 [pdf, ps, html, other]
-
Title: Uncertainty-Aware Shape Estimation of a Surgical Continuum Manipulator in Constrained Environments using Fiber Bragg Grating SensorsAlexander Schwarz, Arian Mehrfard, Golchehr Amirkhani, Henry Phalen, Justin H. Ma, Robert B. Grupp, Alejandro Martin-Gomez, Mehran ArmandSubjects: Robotics (cs.RO)
Continuum Dexterous Manipulators (CDMs) are well-suited tools for minimally invasive surgery due to their inherent dexterity and reachability. Nonetheless, their flexible structure and non-linear curvature pose significant challenges for shape-based feedback control. The use of Fiber Bragg Grating (FBG) sensors for shape sensing has shown great potential in estimating the CDM's tip position and subsequently reconstructing the shape using optimization algorithms. This optimization, however, is under-constrained and may be ill-posed for complex shapes, falling into local minima. In this work, we introduce a novel method capable of directly estimating a CDM's shape from FBG sensor wavelengths using a deep neural network. In addition, we propose the integration of uncertainty estimation to address the critical issue of uncertainty in neural network predictions. Neural network predictions are unreliable when the input sample is outside the training distribution or corrupted by noise. Recognizing such deviations is crucial when integrating neural networks within surgical robotics, as inaccurate estimations can pose serious risks to the patient. We present a robust method that not only improves the precision upon existing techniques for FBG-based shape estimation but also incorporates a mechanism to quantify the models' confidence through uncertainty estimation. We validate the uncertainty estimation through extensive experiments, demonstrating its effectiveness and reliability on out-of-distribution (OOD) data, adding an additional layer of safety and precision to minimally invasive surgical robotics.
- [230] arXiv:2405.07106 [pdf, ps, other]
-
Title: Real-Time Simulation of a Resilient Control Center for Inverter-Based MicrogridsComments: Accepted for publication at IECON 2024 - 50th Annual Conference of the IEEE Industrial Electronics SocietySubjects: Systems and Control (eess.SY)
The number of installed remote terminal units (RTU) is on the rise, increasing the observability and control of the power system. RTUs enable sending data to and receiving data from a control center in the power system. A distribution grid control center runs distribution management system (DMS) algorithms, where the DMS takes control actions during transients and outages, such as tripping a circuit breaker and disconnecting a controllable load to increase the resiliency of the grid. Relying on communication-based devices makes the control center vulnerable to cyberattacks, and attackers can send falsified data to the control center to cause disturbances or power outages. Previous work has conducted research on developing ways to detect a cyberattack and ways to mitigate the adverse effects of the attack. This work studies false data injection (FDI) attacks on the DMS algorithm of a fully inverter-based microgrid in real time. The fully inverter-based microgrid is simulated using an RTDS, an amplifier, an electronic load, a server, a network switch, and a router. The DMS is integrated into the server codes and exchanges data with RTDS through TCP/IP protocols. Moreover, a recurrent neural network (RNN) algorithm is used to detect and mitigate the cyberattack. The effectiveness of the detection and mitigation algorithm is tested under various scenarios using the real-time testbed.
- [231] arXiv:2405.07107 [pdf, ps, other]
-
Title: A Pair of Bayesian Network Structures has Undecidable Conditional IndependenciesComments: 13 pages, 2 figuresSubjects: Computational Complexity (cs.CC); Information Theory (cs.IT); Probability (math.PR)
Given a Bayesian network structure (directed acyclic graph), the celebrated d-separation algorithm efficiently determines whether the network structure implies a given conditional independence relation. We show that this changes drastically when we consider two Bayesian network structures instead. It is undecidable to determine whether two given network structures imply a given conditional independency, that is, whether every collection of random variables satisfying both network structures must also satisfy the conditional independency. Although the approximate combination of two Bayesian networks is a well-studied topic, our result shows that it is fundamentally impossible to accurately combine the knowledge of two Bayesian network structures, in the sense that no algorithm can tell what conditional independencies are implied by the two network structures. We can also explicitly construct two Bayesian network structures, such that whether they imply a certain conditional independency is unprovable in the ZFC set theory, assuming ZFC is consistent.
- [232] arXiv:2405.07108 [pdf, ps, other]
-
Title: Memory-Based Set Point Modulation for Improved Transient Response of Distributed Energy ResourcesComments: Accepted for publication at IECON 2024 - 50th Annual Conference of the IEEE Industrial Electronics SocietySubjects: Systems and Control (eess.SY)
As the composition of the power grid evolves to integrate more renewable generation, its reliance on distributed energy resources (DER) is increasing. Existing DERs are often controlled with proportional integral (PI) controllers that, if not properly tuned or if system parameters change, exhibit sluggish performance or large overshoot. The use of set point automatic adjustment with correction-enabled (SPAACE) with a linear predictor improves the transient response of these DERs without the need to access the PI controller parameters. The limitation of the existing SPAACE method is the high sampling rate needed for improved performance, which is not always practical. This paper proposes the addition of a memory term to the SPAACE with a linear predictor. This memory term is the integral of the errors of previous samples, which adds another layer to the prediction to improve the response at lower sampling rates and further reduces the overshoot and settling time compared to the existing SPAACE method. Time-domain simulation studies are performed in PSCAD/EMTDC to show the effectiveness of the proposed controller.
- [233] arXiv:2405.07111 [pdf, ps, html, other]
-
Title: Designing and Evaluating Dialogue LLMs for Co-Creative Improvised TheatreComments: 13 pages, 7 figures, accepted for publication at the International Conference on Computational Creativity 2024Subjects: Computation and Language (cs.CL)
Social robotics researchers are increasingly interested in multi-party trained conversational agents. With a growing demand for real-world evaluations, our study presents Large Language Models (LLMs) deployed in a month-long live show at the Edinburgh Festival Fringe. This case study investigates human improvisers co-creating with conversational agents in a professional theatre setting. We explore the technical capabilities and constraints of on-the-spot multi-party dialogue, providing comprehensive insights from both audience and performer experiences with AI on stage. Our human-in-the-loop methodology underlines the challenges of these LLMs in generating context-relevant responses, stressing the user interface's crucial role. Audience feedback indicates an evolving interest for AI-driven live entertainment, direct human-AI interaction, and a diverse range of expectations about AI's conversational competence and utility as a creativity support tool. Human performers express immense enthusiasm, varied satisfaction, and the evolving public opinion highlights mixed emotions about AI's role in arts.
- [234] arXiv:2405.07116 [pdf, ps, other]
-
Title: CoViews: Adaptive Augmentation Using Cooperative Views for Enhanced Contrastive LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Data augmentation plays a critical role in generating high-quality positive and negative pairs necessary for effective contrastive learning. However, common practices involve using a single augmentation policy repeatedly to generate multiple views, potentially leading to inefficient training pairs due to a lack of cooperation between views. Furthermore, to find the optimal set of augmentations, many existing methods require extensive supervised evaluation, overlooking the evolving nature of the model that may require different augmentations throughout the training. Other approaches train differentiable augmentation generators, thus limiting the use of non-differentiable transformation functions from the literature. In this paper, we address these challenges by proposing a framework for learning efficient adaptive data augmentation policies for contrastive learning with minimal computational overhead. Our approach continuously generates new data augmentation policies during training and produces effective positives/negatives without any supervision. Within this framework, we present two methods: \ac{IndepViews}, which generates augmentation policies used across all views, and \ac{CoViews}, which generates dependent augmentation policies for each view. This enables us to learn dependencies between the transformations applied to each view and ensures that the augmentation strategies applied to different views complement each other, leading to more meaningful and discriminative representations. Through extensive experimentation on multiple datasets and contrastive learning frameworks, we demonstrate that our method consistently outperforms baseline solutions and that training with a view-dependent augmentation policy outperforms training with an independent policy shared across views, showcasing its effectiveness in enhancing contrastive learning performance.
- [235] arXiv:2405.07117 [pdf, ps, html, other]
-
Title: Context Neural Networks: A Scalable Multivariate Model for Time Series ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Real-world time series often exhibit complex interdependencies that cannot be captured in isolation. Global models that model past data from multiple related time series globally while producing series-specific forecasts locally are now common. However, their forecasts for each individual series remain isolated, failing to account for the current state of its neighbouring series. Multivariate models like multivariate attention and graph neural networks can explicitly incorporate inter-series information, thus addressing the shortcomings of global models. However, these techniques exhibit quadratic complexity per timestep, limiting scalability. This paper introduces the Context Neural Network, an efficient linear complexity approach for augmenting time series models with relevant contextual insights from neighbouring time series without significant computational overhead. The proposed method enriches predictive models by providing the target series with real-time information from its neighbours, addressing the limitations of global models, yet remaining computationally tractable for large datasets.
- [236] arXiv:2405.07119 [pdf, ps, other]
-
Title: Best-response Algorithms for Integer Convex Quadratic Simultaneous GamesSubjects: Computer Science and Game Theory (cs.GT)
We evaluate the best-response algorithm in the context of pure-integer convex quadratic games. We provide a sufficient condition that if certain interaction matrices (the product of the inverse of the positive definite matrix defining the convex quadratic terms and the matrix that connects one player's problem to another's) have all their singular values less than 1, then finite termination of the best-response algorithm is guaranteed regardless of the initial point. Termination is triggered through cycling among a finite number of strategies for each player. Our findings indicate that if cycling happens, a relaxed version of the Nash equilibrium can be calculated by identifying a Nash equilibrium of a smaller finite game. Conversely, we prove that if every singular value of the interaction matrices is greater than 1, the algorithm will diverge from a large family of initial points. In addition, we provide an infinite family of examples in which some of the singular values of the interaction matrices are greater than 1, cycling occurs, but any mixed-strategy with support in the strategies where cycling occurs has arbitrarily better deviations. Then, we perform computational tests of our algorithm and compare it with standard algorithms to solve such problems. We notice that our algorithm finds a Nash equilibrium correctly in every instance. Moreover, compared to a state-of-the art algorithm, our method shows similar performance in two-player games and significantly higher speed when involving three or more players.
- [237] arXiv:2405.07121 [pdf, ps, html, other]
-
Title: In The Wild Ellipse Parameter Estimation for Circular Dining Plates and BowlsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Ellipse estimation is an important topic in food image processing because it can be leveraged to parameterize plates and bowls, which in turn can be used to estimate camera view angles and food portion sizes. Automatically detecting the elliptical rim of plates and bowls and estimating their ellipse parameters for data "in-the-wild" is challenging: diverse camera angles and plate shapes could have been used for capture, noisy background, multiple non-uniform plates and bowls in the image could be present. Recent advancements in foundational models offer promising capabilities for zero-shot semantic understanding and object segmentation. However, the output mask boundaries for plates and bowls generated by these models often lack consistency and precision compared to traditional ellipse fitting methods. In this paper, we combine ellipse fitting with semantic information extracted by zero-shot foundational models and propose WildEllipseFit, a method to detect and estimate the elliptical rim for plate and bowl. Evaluation on the proposed Yummly-ellipse dataset demonstrates its efficacy and zero-shot capability in real-world scenarios.
- [238] arXiv:2405.07122 [pdf, ps, html, other]
-
Title: PCF Learned Sort: a Learning Augmented Sort Algorithm with $O(n \log\log n)$ Expected ComplexitySubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
Sorting is one of the most fundamental algorithms in computer science. Recently, Learned Sorts, which use machine learning to improve sorting speed, have attracted attention. While existing studies show that Learned Sort is experimentally faster than classical sorting algorithms, they do not provide theoretical guarantees about its computational complexity. We propose PCF Learned Sort, a theoretically guaranteed Learned Sort algorithm. We prove that the expected complexity of PCF Learned Sort is $O(n \log \log n)$ under mild assumptions on the data distribution. We also confirm experimentally that PCF Learned Sort has a computational complexity of $O(n \log \log n)$ on both synthetic and real datasets. This is the first study to theoretically support the experimental success of Learned Sort, and provides evidence for why Learned Sort is fast.
- [239] arXiv:2405.07124 [pdf, ps, other]
-
Title: Vertex Shader Domain Warping with Automatic DifferentiationComments: 11 pages, 7 figuresSubjects: Graphics (cs.GR)
Domain warping is a technique commonly used in creative coding to distort graphics and add visual interest to a work. The approach has the potential to be used in 3D art as mesh vertices can be efficiently warped using a vertex shader in a WebGL pipeline. However, 3D models packaged for the web typically come with baked-in normal vectors, and these need to be updated when vertex positions change for lighting calculations to work. This is typically done via finite differences, which requires parameter tuning to achieve optimal visual fidelity. We present a method for 3D domain warping that works with automatic differentiation, allowing exact normals to be used without any tuning while still benefiting from hardware acceleration.
- [240] arXiv:2405.07128 [pdf, ps, other]
-
Title: 5G Virtual Reality Manipulator Teleoperation using a Mobile PhoneSubjects: Robotics (cs.RO)
This paper presents an approach to teleoperate a manipulator using a mobile phone as a leader device. Using its IMU and camera, the phone estimates its Cartesian pose which is then used to to control the Cartesian pose of the robot's tool. The user receives visual feedback in the form of multi-view video - a point cloud rendered in a virtual reality environment. This enables the user to observe the scene from any position. To increase immersion, the robot's estimate of external forces is relayed using the phone's haptic actuator. Leader and follower are connected through wireless networks such as 5G or Wi-Fi. The paper describes the setup and analyzes its performance.
- [241] arXiv:2405.07131 [pdf, ps, other]
-
Title: MAxPrototyper: A Multi-Agent Generation System for Interactive User Interface PrototypingSubjects: Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)
In automated user interactive design, designers face key challenges, including accurate representation of user intent, crafting high-quality components, and ensuring both aesthetic and semantic consistency. Addressing these challenges, we introduce MAxPrototyper, our human-centered, multi-agent system for interactive design generation. The core of MAxPrototyper is a theme design agent. It coordinates with specialized sub-agents, each responsible for generating specific parts of the design. Through an intuitive online interface, users can control the design process by providing text descriptions and layout. Enhanced by improved language and image generation models, MAxPrototyper generates each component with careful detail and contextual understanding. Its multi-agent architecture enables a multi-round interaction capability between the system and users, facilitating precise and customized design adjustments throughout the creation process.
- [242] arXiv:2405.07135 [pdf, ps, other]
-
Title: Combining multiple post-training techniques to achieve most efficient quantized LLMsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of quantization to mitigate these challenges. We systematically study the combined application of two well-known post-training techniques, SmoothQuant and GPTQ, and provide a comprehensive analysis of their interactions and implications for advancing LLM quantization. We enhance the versatility of both techniques by enabling quantization to microscaling (MX) formats, expanding their applicability beyond their initial fixed-point format targets. We show that by applying GPTQ and SmoothQuant, and employing MX formats for quantizing models, we can achieve a significant reduction in the size of OPT models by up to 4x and LLaMA models by up to 3x with a negligible perplexity increase of 1-3%.
- [243] arXiv:2405.07139 [pdf, ps, html, other]
-
Title: Reduced Krylov Basis Methods for Parametric Partial Differential EquationsComments: 23 pages, 6 figuresSubjects: Numerical Analysis (math.NA)
This work is on a user-friendly reduced basis method for solving a family of parametric PDEs by preconditioned Krylov subspace methods including the conjugate gradient method, generalized minimum residual method, and bi-conjugate gradient method. The proposed methods use a preconditioned Krylov subspace method for a high-fidelity discretization of one parameter instance to generate orthogonal basis vectors of the reduced basis subspace. Then large-scale discrete parameter-dependent problems are approximately solved in the low-dimensional Krylov subspace. As shown in the theory and experiments, only a small number of Krylov subspace iterations are needed to simultaneously generate approximate solutions of a family of high-fidelity and large-scale systems in the reduced basis subspace. This reduces the computational cost dramatically because (1) to construct the reduced basis vectors, we only solve one large-scale problem in the high-fidelity level; and (2) the family of large-scale problems restricted to the reduced basis subspace have much smaller sizes.
- [244] arXiv:2405.07140 [pdf, ps, other]
-
Title: Edge Intelligence Optimization for Large Language Model Inference with Batching and QuantizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Generative Artificial Intelligence (GAI) is taking the world by storm with its unparalleled content creation ability. Large Language Models (LLMs) are at the forefront of this movement. However, the significant resource demands of LLMs often require cloud hosting, which raises issues regarding privacy, latency, and usage limitations. Although edge intelligence has long been utilized to solve these challenges by enabling real-time AI computation on ubiquitous edge resources close to data sources, most research has focused on traditional AI models and has left a gap in addressing the unique characteristics of LLM inference, such as considerable model size, auto-regressive processes, and self-attention mechanisms. In this paper, we present an edge intelligence optimization problem tailored for LLM inference. Specifically, with the deployment of the batching technique and model quantization on resource-limited edge devices, we formulate an inference model for transformer decoder-based LLMs. Furthermore, our approach aims to maximize the inference throughput via batch scheduling and joint allocation of communication and computation resources, while also considering edge resource constraints and varying user requirements of latency and accuracy. To address this NP-hard problem, we develop an optimal Depth-First Tree-Searching algorithm with online tree-Pruning (DFTSP) that operates within a feasible time complexity. Simulation results indicate that DFTSP surpasses other batching benchmarks in throughput across diverse user settings and quantization techniques, and it reduces time complexity by over 45% compared to the brute-force searching method.
- [245] arXiv:2405.07142 [pdf, ps, html, other]
-
Title: Cross-Domain Continual Learning via CLAMPComments: Under Review in Elsevier JournalSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Artificial neural networks, celebrated for their human-like cognitive learning abilities, often encounter the well-known catastrophic forgetting (CF) problem, where the neural networks lose the proficiency in previously acquired knowledge. Despite numerous efforts to mitigate CF, it remains the significant challenge particularly in complex changing environments. This challenge is even more pronounced in cross-domain adaptation following the continual learning (CL) setting, which is a more challenging and realistic scenario that is under-explored. To this end, this article proposes a cross-domain CL approach making possible to deploy a single model in such environments without additional labelling costs. Our approach, namely continual learning approach for many processes (CLAMP), integrates a class-aware adversarial domain adaptation strategy to align a source domain and a target domain. An assessor-guided learning process is put forward to navigate the learning process of a base model assigning a set of weights to every sample controlling the influence of every sample and the interactions of each loss function in such a way to balance the stability and plasticity dilemma thus preventing the CF problem. The first assessor focuses on the negative transfer problem rejecting irrelevant samples of the source domain while the second assessor prevents noisy pseudo labels of the target domain. Both assessors are trained in the meta-learning approach using random transformation techniques and similar samples of the source domain. Theoretical analysis and extensive numerical validations demonstrate that CLAMP significantly outperforms established baseline algorithms across all experiments by at least $10\%$ margin.
- [246] arXiv:2405.07145 [pdf, ps, other]
-
Title: Stable Signature is Unstable: Removing Image Watermark from Diffusion ModelsSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was claimed to be robust against removal attacks. In this work, we propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images. Our results highlight that Stable Signature is not as stable as previously thought.
- [247] arXiv:2405.07146 [pdf, ps, html, other]
-
Title: TRAIL: Cross-Shard Validation for Cryptocurrency Byzantine Shard ProtectionSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We present TRAIL: an algorithm that uses a novel consensus procedure to tolerate failed or malicious shards within a blockchain-based cryptocurrency. Our algorithm takes a new approach of selecting validator shards for each transaction from those that previously held the assets being transferred. This approach ensures the algorithm's robustness and efficiency. TRAIL is presented using PBFT for internal shard transaction processing and a modified version of PBFT for external cross-shard validation. We describe TRAIL, prove it correct, analyze its message complexity, and evaluate its performance. We propose various TRAIL optimizations: we describe how it can be adapted to other Byzantine-tolerant consensus algorithms, how a complete system may be built on the basis of it, and how TRAIL can be applied to existing and future sharded blockchains.
- [248] arXiv:2405.07147 [pdf, ps, other]
-
Title: Randomized algorithms for computing the tensor train approximation and their applicationsComments: 43 pages, 9 figures and 4 tablesSubjects: Numerical Analysis (math.NA)
In this paper, we focus on the fixed TT-rank and precision problems of finding an approximation of the tensor train (TT) decomposition of a tensor. Note that the TT-SVD and TT-cross are two well-known algorithms for these two problems. Firstly, by combining the random projection technique with the power scheme, we obtain two types of randomized algorithms for the fixed TT-rank problem. Secondly, by using the non-asymptotic theory of sub-random Gaussian matrices, we derive the upper bounds of the proposed randomized algorithms. Thirdly, we deduce a new deterministic strategy to estimate the desired TT-rank with a given tolerance and another adaptive randomized algorithm that finds a low TT-rank representation satisfying a given tolerance, and is beneficial when the target TT-rank is not known in advance. We finally illustrate the accuracy of the proposed algorithms via some test tensors from synthetic and real databases. In particular, for the fixed TT-rank problem, the proposed algorithms can be several times faster than the TT-SVD, and the accuracy of the proposed algorithms and the TT-SVD are comparable for several test tensors.
- [249] arXiv:2405.07151 [pdf, ps, other]
-
Title: Group Complete-$\{s\}$ Pliable Index CodingComments: Accepted for publication in 2024 IEEE International Symposium on Information TheorySubjects: Information Theory (cs.IT)
This paper introduces a novel class of PICOD($t$) problems referred to as $g$-group complete-$S$ PICOD($t$) problems. It constructs a multi-stage achievability scheme to generate pliable index codes for group complete PICOD problems when $S = \{s\}$ is a singleton set. Using the maximum acyclic induced subgraph bound, lower bounds on the broadcast rate are derived for singleton $S$, which establishes the optimality of the achievability scheme for a range of values for $t$ and for any $g$ and $s$. For all other values, it is shown that the achievability scheme is optimal among the restricted class of broadcast codes.
- [250] arXiv:2405.07155 [pdf, ps, other]
-
Title: Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing ModalitiesSubjects: Computer Vision and Pattern Recognition (cs.CV)
In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Hence, an important research question is if it is possible for trained multi-modal models to have high accuracy even when influential modalities are absent from the input data. In this paper, we propose a novel approach called Meta-learned Cross-modal Knowledge Distillation (MCKD) to address this research question. MCKD adaptively estimates the importance weight of each modality through a meta-learning process. These dynamically learned modality importance weights are used in a pairwise cross-modal knowledge distillation process to transfer the knowledge from the modalities with higher importance weight to the modalities with lower importance weight. This cross-modal knowledge distillation produces a highly accurate model even with the absence of influential modalities. Differently from previous methods in the field, our approach is designed to work in multiple tasks (e.g., segmentation and classification) with minimal adaptation. Experimental results on the Brain tumor Segmentation Dataset 2018 (BraTS2018) and the Audiovision-MNIST classification dataset demonstrate the superiority of MCKD over current state-of-the-art models. Particularly in BraTS2018, we achieve substantial improvements of 3.51\% for enhancing tumor, 2.19\% for tumor core, and 1.14\% for the whole tumor in terms of average segmentation Dice score.
- [251] arXiv:2405.07157 [pdf, ps, html, other]
-
Title: Semi-Self-Supervised Domain Adaptation: Developing Deep Learning Models with Limited Annotated Data for Wheat Head SegmentationComments: 12Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Precision agriculture involves the application of advanced technologies to improve agricultural productivity, efficiency, and profitability while minimizing waste and environmental impact. Deep learning approaches enable automated decision-making for many visual tasks. However, in the agricultural domain, variability in growth stages and environmental conditions, such as weather and lighting, presents significant challenges to developing deep learning-based techniques that generalize across different conditions. The resource-intensive nature of creating extensive annotated datasets that capture these variabilities further hinders the widespread adoption of these approaches. To tackle these issues, we introduce a semi-self-supervised domain adaptation technique based on deep convolutional neural networks with a probabilistic diffusion process, requiring minimal manual data annotation. Using only three manually annotated images and a selection of video clips from wheat fields, we generated a large-scale computationally annotated dataset of image-mask pairs and a large dataset of unannotated images extracted from video frames. We developed a two-branch convolutional encoder-decoder model architecture that uses both synthesized image-mask pairs and unannotated images, enabling effective adaptation to real images. The proposed model achieved a Dice score of 80.7\% on an internal test dataset and a Dice score of 64.8\% on an external test set, composed of images from five countries and spanning 18 domains, indicating its potential to develop generalizable solutions that could encourage the wider adoption of advanced technologies in agriculture.
- [252] arXiv:2405.07162 [pdf, ps, other]
-
Title: Learning Reward for Robot Skills Using Large Language Models via Self-AlignmentComments: ICML 2024Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.
- [253] arXiv:2405.07164 [pdf, ps, other]
-
Title: Modeling Pedestrian Intrinsic Uncertainty for Multimodal Stochastic Trajectory Prediction via Energy Plan DenoisingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Pedestrian trajectory prediction plays a pivotal role in the realms of autonomous driving and smart cities. Despite extensive prior research employing sequence and generative models, the unpredictable nature of pedestrians, influenced by their social interactions and individual preferences, presents challenges marked by uncertainty and multimodality. In response, we propose the Energy Plan Denoising (EPD) model for stochastic trajectory prediction. EPD initially provides a coarse estimation of the distribution of future trajectories, termed the Plan, utilizing the Langevin Energy Model. Subsequently, it refines this estimation through denoising via the Probabilistic Diffusion Model. By initiating denoising with the Plan, EPD effectively reduces the need for iterative steps, thereby enhancing efficiency. Furthermore, EPD differs from conventional approaches by modeling the distribution of trajectories instead of individual trajectories. This allows for the explicit modeling of pedestrian intrinsic uncertainties and eliminates the need for multiple denoising operations. A single denoising operation produces a distribution from which multiple samples can be drawn, significantly enhancing efficiency. Moreover, EPD's fine-tuning of the Plan contributes to improved model performance. We validate EPD on two publicly available datasets, where it achieves state-of-the-art results. Additionally, ablation experiments underscore the contributions of individual modules, affirming the efficacy of the proposed approach.
- [254] arXiv:2405.07166 [pdf, ps, other]
-
Title: Resource Efficient Perception for Vision SystemsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images. It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content. In contrast to traditional training methods which are limited by memory constraints, our method enables training of ultra high resolution images. We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation. Notably, the proposed method achieves strong performance even on resource-constrained devices like Jetson Nano. Our code is available at this https URL.
- [255] arXiv:2405.07167 [pdf, ps, other]
-
Title: 3D Hand Mesh Recovery from Monocular RGB in Camera SpaceComments: 21 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
With the rapid advancement of technologies such as virtual reality, augmented reality, and gesture control, users expect interactions with computer interfaces to be more natural and intuitive. Existing visual algorithms often struggle to accomplish advanced human-computer interaction tasks, necessitating accurate and reliable absolute spatial prediction methods. Moreover, dealing with complex scenes and occlusions in monocular images poses entirely new challenges. This study proposes a network model that performs parallel processing of root-relative grids and root recovery tasks. The model enables the recovery of 3D hand meshes in camera space from monocular RGB images. To facilitate end-to-end training, we utilize an implicit learning approach for 2D heatmaps, enhancing the compatibility of 2D cues across different subtasks. Incorporate the Inception concept into spectral graph convolutional network to explore relative mesh of root, and integrate it with the locally detailed and globally attentive method designed for root recovery exploration. This approach improves the model's predictive performance in complex environments and self-occluded scenes. Through evaluation on the large-scale hand dataset FreiHAND, we have demonstrated that our proposed model is comparable with state-of-the-art models. This study contributes to the advancement of techniques for accurate and reliable absolute spatial prediction in various human-computer interaction applications.
- [256] arXiv:2405.07169 [pdf, ps, html, other]
-
Title: Challenges and Opportunities for Large-Scale Exploration with Air-Ground Teams using SemanticsFernando Cladera, Ian D. Miller, Zachary Ravichandran, Varun Murali, Jason Hughes, M. Ani Hsieh, C. J. Taylor, Vijay KumarComments: 6 pages, 5 figresSubjects: Robotics (cs.RO)
One common and desirable application of robots is exploring potentially hazardous and unstructured environments. Air-ground collaboration offers a synergistic approach to addressing such exploration challenges. In this paper, we demonstrate a system for large-scale exploration using a team of aerial and ground robots. Our system uses semantics as lingua franca, and relies on fully opportunistic communications. We highlight the unique challenges from this approach, explain our system architecture and showcase lessons learned during our experiments. All our code is open-source, encouraging researchers to use it and build upon.
- [257] arXiv:2405.07171 [pdf, ps, other]
-
Title: Enhanced Online Test-time Adaptation with Feature-Weight Cosine AlignmentComments: 22 pages, 7 figures, 8 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Online Test-Time Adaptation (OTTA) has emerged as an effective strategy to handle distributional shifts, allowing on-the-fly adaptation of pre-trained models to new target domains during inference, without the need for source data. We uncovered that the widely studied entropy minimization (EM) method for OTTA, suffers from noisy gradients due to ambiguity near decision boundaries and incorrect low-entropy predictions. To overcome these limitations, this paper introduces a novel cosine alignment optimization approach with a dual-objective loss function that refines the precision of class predictions and adaptability to novel domains. Specifically, our method optimizes the cosine similarity between feature vectors and class weight vectors, enhancing the precision of class predictions and the model's adaptability to novel domains. Our method outperforms state-of-the-art techniques and sets a new benchmark in multiple datasets, including CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet datasets, demonstrating high accuracy and robustness against diverse corruptions and domain shifts.
- [258] arXiv:2405.07172 [pdf, ps, other]
-
Title: Observability and Incident Response in Managed Serverless Environments Using Ontology-Based Log MonitoringLavi Ben-Shimol, Edita Grolman, Aviad Elyashar, Inbar Maimon, Dudu Mimran, Oleg Brodt, Martin Strassmann, Heiko Lehmann, Yuval Elovici, Asaf ShabtaiSubjects: Cryptography and Security (cs.CR)
In a fully managed serverless environment, the cloud service provider is responsible for securing the cloud infrastructure, thereby reducing the operational and maintenance efforts of application developers. However, this environment limits the use of existing cybersecurity frameworks and tools, which reduces observability and situational awareness capabilities (e.g., risk assessment, incident response). In addition, existing security frameworks for serverless applications do not generalize well to all application architectures and usually require adaptation, specialized expertise, etc. for use in fully managed serverless environments. In this paper, we introduce a three-layer security scheme for applications deployed in fully managed serverless environments. The first two layers involve a unique ontology based solely on serverless logs which is used to transform them into a unified application activity knowledge graph. In the third layer, we address the need for observability and situational awareness capabilities by implementing two situational awareness tools that utilizes the graph-based representation: 1) An incident response dashboard that leverages the ontology to visualize and examine application activity logs in the context of cybersecurity alerts. Our user study showed that the dashboard enabled participants to respond more accurately and quickly to new security alerts than the baseline tool. 2) A criticality of asset (CoA) risk assessment framework that enables efficient expert-based prioritization in cybersecurity contexts.
- [259] arXiv:2405.07174 [pdf, ps, other]
-
Title: CRSFL: Cluster-based Resource-aware Split Federated Learning for Continuous AuthenticationMohamad Wazzeh, Mohamad Arafeh, Hani Sami, Hakima Ould-Slimane, Chamseddine Talhi, Azzam Mourad, Hadi OtrokSubjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
In the ever-changing world of technology, continuous authentication and comprehensive access management are essential during user interactions with a device. Split Learning (SL) and Federated Learning (FL) have recently emerged as promising technologies for training a decentralized Machine Learning (ML) model. With the increasing use of smartphones and Internet of Things (IoT) devices, these distributed technologies enable users with limited resources to complete neural network model training with server assistance and collaboratively combine knowledge between different nodes. In this study, we propose combining these technologies to address the continuous authentication challenge while protecting user privacy and limiting device resource usage. However, the model's training is slowed due to SL sequential training and resource differences between IoT devices with different specifications. Therefore, we use a cluster-based approach to group devices with similar capabilities to mitigate the impact of slow devices while filtering out the devices incapable of training the model. In addition, we address the efficiency and robustness of training ML models by using SL and FL techniques to train the clients simultaneously while analyzing the overhead burden of the process. Following clustering, we select the best set of clients to participate in training through a Genetic Algorithm (GA) optimized on a carefully designed list of objectives. The performance of our proposed framework is compared to baseline methods, and the advantages are demonstrated using a real-life UMDAA-02-FD face detection dataset. The results show that CRSFL, our proposed approach, maintains high accuracy and reduces the overhead burden in continuous authentication scenarios while preserving user privacy.
- [260] arXiv:2405.07175 [pdf, ps, html, other]
-
Title: On-Demand Model and Client Deployment in Federated Learning with Deep Reinforcement LearningSubjects: Machine Learning (cs.LG)
In Federated Learning (FL), the limited accessibility of data from diverse locations and user types poses a significant challenge due to restricted user participation. Expanding client access and diversifying data enhance models by incorporating diverse perspectives, thereby enhancing adaptability. However, challenges arise in dynamic and mobile environments where certain devices may become inaccessible as FL clients, impacting data availability and client selection methods. To address this, we propose an On-Demand solution, deploying new clients using Docker Containers on-the-fly. Our On-Demand solution, employing Deep Reinforcement Learning (DRL), targets client availability and selection, while considering data shifts, and container deployment complexities. It employs an autonomous end-to-end solution for handling model deployment and client selection. The DRL strategy uses a Markov Decision Process (MDP) framework, with a Master Learner and a Joiner Learner. The designed cost functions represent the complexity of the dynamic client deployment and selection. Simulated tests show that our architecture can easily adjust to changes in the environment and respond to On-Demand requests. This underscores its ability to improve client availability, capability, accuracy, and learning efficiency, surpassing heuristic and tabular reinforcement learning solutions.
- [261] arXiv:2405.07176 [pdf, ps, html, other]
-
Title: Capacity Maximization for Base Station with Hybrid Fixed and Movable AntennasSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Six-dimensional movable antenna (6DMA) is an effective solution for enhancing wireless network capacity through the adjustment of both 3D positions and 3D rotations of distributed antennas/antenna surfaces. Although freely positioning/rotating 6DMA surfaces offers the greatest flexibility and thus highest capacity improvement, its implementation may be challenging in practice due to the drastic architecture change required for existing base stations (BSs), which predominantly adopt fixed-position antenna (FPA) arrays (e.g., sector antenna arrays). Thus, we introduce in this letter a new BS architecture called hybrid fixed and movable antennas (HFMA), which consists of both conventional FPA arrays and position/rotation-adjustable 6DMA surfaces. For ease of implementation, we consider that all 6DMA surfaces can rotate along a circular track above the FPA arrays. We aim to maximize the network capacity via optimizing the rotation angles of all 6DMA surfaces based on the users' spatial distribution. Since this problem is combinatorial and its optimal solution requires prohibitively high computational complexity via exhaustive search, we propose an alternative adaptive Markov Chain Monte Carlo based method to solve it more efficiently. Finally, we present simulation results that show significant performance gains achieved by our proposed design over various benchmark schemes.
- [262] arXiv:2405.07178 [pdf, ps, html, other]
-
Title: Hologram: Realtime Holographic Overlays via LiDAR Augmented ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Guided by the hologram technology of the infamous Star Wars franchise, I present an application that creates real-time holographic overlays using LiDAR augmented 3D reconstruction. Prior attempts involve SLAM or NeRFs which either require highly calibrated scenes, incur steep computation costs, or fail to render dynamic scenes. I propose 3 high-fidelity reconstruction tools that can run on a portable device, such as a iPhone 14 Pro, which can allow for metric accurate facial reconstructions. My systems enable interactive and immersive holographic experiences that can be used for a wide range of applications, including augmented reality, telepresence, and entertainment.
- [263] arXiv:2405.07180 [pdf, ps, html, other]
-
Title: Repairing Reed-Solomon Codes with Side InformationThi Xinh Dinh, Ba Thong Le, Son Hoang Dau, Serdar Boztas, Stanislav Kruglik, Han Mao Kiah, Emanuele Viterbo, Tuvi Etzion, Yeow Meng CheeSubjects: Information Theory (cs.IT)
We generalize the problem of recovering a lost/erased symbol in a Reed-Solomon code to the scenario in which some side information about the lost symbol is known. The side information is represented as a set $S$ of linearly independent combinations of the sub-symbols of the lost symbol. When $S = \varnothing$, this reduces to the standard problem of repairing a single codeword symbol. When $S$ is a set of sub-symbols of the erased one, this becomes the repair problem with partially lost/erased symbol. We first establish that the minimum repair bandwidth depends on $|S|$ and not the content of $S$ and construct a lower bound on the repair bandwidth of a linear repair scheme with side information $S$. We then consider the well-known subspace-polynomial repair schemes and show that their repair bandwidths can be optimized by choosing the right subspaces. Finally, we demonstrate several parameter regimes where the optimal bandwidths can be achieved for full-length Reed-Solomon codes.
- [264] arXiv:2405.07188 [pdf, ps, html, other]
-
Title: Deciding regular games: a playground for exponential time algorithmsSubjects: Computer Science and Game Theory (cs.GT)
Regular games form a well-established class of games for analysis and synthesis of reactive systems. They include coloured Muller games, McNaughton games, Muller games, Rabin games, and Streett games. These games are played on directed graphs $\mathcal G$ where Player 0 and Player 1 play by generating an infinite path $\rho$ through the graph. The winner is determined by specifications put on the set $X$ of vertices in $\rho$ that occur infinitely often. These games are determined, enabling the partitioning of $\mathcal G$ into two sets $W_0$ and $W_1$ of winning positions for Player 0 and Player 1, respectively. Numerous algorithms exist that decide specific instances of regular games, e.g., Muller games, by computing $W_0$ and $W_1$. In this paper we aim to find general principles for designing uniform algorithms that decide all regular games. For this we utilise various recursive and dynamic programming algorithms that leverage standard notions such as subgames and traps. Importantly, we show that our techniques improve or match the performances of existing algorithms for many instances of regular games.
- [265] arXiv:2405.07189 [pdf, ps, other]
-
Title: A hybrid meta-heuristic approach for channel estimation in OFDM MIMOJournal-ref: Journal of Gono Bishwabidyalay, Vol. 4, Issue. 1, PP. 224-236, 2023Subjects: Networking and Internet Architecture (cs.NI)
In wireless communication Multiple Input Multiple Output (MIMO) technology has brought significant improvement in service by adopting Orthogonal Frequency Division Multiplexing (OFDM), a digital modulation technique. To achieve great performance with MIMO efficiently gathering channel state information (CSI) plays a vital role. Among different approach of channel estimation techniques data-aided channel estimation is more reliable. The existing methods of data-aided channel estimation are Least Square (LS) and Minimum Mean Square Error (MMSE) methods which do not achieve a great performance. Moreover, MMSE is little complex and has higher computational cost. That is why many attempts have been done previously to optimize the methods with help of meta heuristics and also other ways. In this paper we have tried to optimize LS estimation with a combined algorithm of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The proposed algorithm has outperformed LS and MMSE. And it gives similar result if we optimize LS with standard PSO but in less numbers of iteration.
- [266] arXiv:2405.07193 [pdf, ps, other]
-
Title: Learning Design Preferences through Design Feature Extraction and Weighted EnsembleSubjects: Human-Computer Interaction (cs.HC)
Design is a factor that plays an important role in consumer purchase decisions. As the need for understanding and predicting various preferences for each customer increases along with the importance of mass customization, predicting individual design preferences has become a critical factor in product development. However, current methods for predicting design preferences have some limitations. Product design involves a vast amount of high-dimensional information, and personal design preference is a complex and heterogeneous area of emotion unique to each individual. To address these challenges, we propose an approach that utilizes dimensionality reduction model to transform design samples into low-dimensional feature vectors, enabling us to extract the key representational features of each design. For preference prediction models using feature vectors, by referring to the design preference tendencies of others, we can predict the individual-level design preferences more accurately. Our proposed framework overcomes the limitations of traditional methods to determine design preferences, allowing us to accurately identify design features and predict individual preferences for specific products. Through this framework, we can improve the effectiveness of product development and create personalized product recommendations that cater to the unique needs of each consumer.
- [267] arXiv:2405.07194 [pdf, ps, html, other]
-
Title: Differentiable Model Scaling using Differentiable TopkComments: Accepted by ICML 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Over the past few years, as large language models have ushered in an era of intelligence emergence, there has been an intensified focus on scaling networks. Currently, many network architectures are designed manually, often resulting in sub-optimal configurations. Although Neural Architecture Search (NAS) methods have been proposed to automate this process, they suffer from low search efficiency. This study introduces Differentiable Model Scaling (DMS), increasing the efficiency for searching optimal width and depth in networks. DMS can model both width and depth in a direct and fully differentiable way, making it easy to optimize. We have evaluated our DMS across diverse tasks, ranging from vision tasks to NLP tasks and various network architectures, including CNNs and Transformers. Results consistently indicate that our DMS can find improved structures and outperforms state-of-the-art NAS methods. Specifically, for image classification on ImageNet, our DMS improves the top-1 accuracy of EfficientNet-B0 and Deit-Tiny by 1.4% and 0.6%, respectively, and outperforms the state-of-the-art zero-shot NAS method, ZiCo, by 1.3% while requiring only 0.4 GPU days for searching. For object detection on COCO, DMS improves the mAP of Yolo-v8-n by 2.0%. For language modeling, our pruned Llama-7B outperforms the prior method with lower perplexity and higher zero-shot classification accuracy. We will release our code in the future.
- [268] arXiv:2405.07195 [pdf, ps, html, other]
-
Title: InsightNet: Structured Insight Mining from Customer FeedbackSandeep Sricharan Mukku, Manan Soni, Jitenkumar Rana, Chetan Aggarwal, Promod Yenigalla, Rashmi Patange, Shyam MohanComments: EMNLP 2023Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.
- [269] arXiv:2405.07196 [pdf, ps, html, other]
-
Title: Permissioned Blockchain-based Framework for Ranking Synthetic Data GeneratorsNarasimha Raghavan Veeraragavan, Mohammad Hossein Tabatabaei, Severin Elvatun, Vibeke Binz Vallevik, Siri Larønningen, Jan F NygårdSubjects: Databases (cs.DB); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How can we select the most suitable synthetic data generator from a set of options for a specific purpose? 2) How can we make the selection process more transparent, accountable, and auditable? To address these questions, we introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Through comprehensive experiments and comparisons with state-of-the-art baseline ranking solutions, our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties. Furthermore, our framework serves as a valuable tool for selecting the optimal synthetic data generators for specific needs while ensuring compliance with data protection principles.
- [270] arXiv:2405.07200 [pdf, ps, html, other]
-
Title: Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function ApproximationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Accurate approximation of complex nonlinear functions is a fundamental challenge across many scientific and engineering domains. Traditional neural network architectures often struggle to capture intricate patterns and irregularities present in high-dimensional functions. This paper introduces the Chebyshev Kolmogorov-Arnold Network (Chebyshev KAN), a novel approach that combines the theoretical foundations of the Kolmogorov-Arnold Theorem with the powerful approximation capabilities of Chebyshev polynomials. 1
- [271] arXiv:2405.07201 [pdf, ps, html, other]
-
Title: Building a Strong Pre-Training Baseline for Universal 3D Large-Scale PerceptionComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
An effective pre-training framework with universal 3D representations is extremely desired in perceiving large-scale dynamic scenes. However, establishing such an ideal framework that is both task-generic and label-efficient poses a challenge in unifying the representation of the same primitive across diverse scenes. The current contrastive 3D pre-training methods typically follow a frame-level consistency, which focuses on the 2D-3D relationships in each detached image. Such inconsiderate consistency greatly hampers the promising path of reaching an universal pre-training framework: (1) The cross-scene semantic self-conflict, i.e., the intense collision between primitive segments of the same semantics from different scenes; (2) Lacking a globally unified bond that pushes the cross-scene semantic consistency into 3D representation learning. To address above challenges, we propose a CSC framework that puts a scene-level semantic consistency in the heart, bridging the connection of the similar semantic segments across various scenes. To achieve this goal, we combine the coherent semantic cues provided by the vision foundation model and the knowledge-rich cross-scene prototypes derived from the complementary multi-modality information. These allow us to train a universal 3D pre-training model that facilitates various downstream tasks with less fine-tuning efforts. Empirically, we achieve consistent improvements over SOTA pre-training approaches in semantic segmentation (+1.4% mIoU), object detection (+1.0% mAP), and panoptic segmentation (+3.0% PQ) using their task-specific 3D network on nuScenes. Code is released at this https URL, hoping to inspire future research.
- [272] arXiv:2405.07202 [pdf, ps, html, other]
-
Title: Unified Video-Language Pre-training with Synchronized AudioSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two modalities. In this work, we propose an enhanced framework for Video-Language pre-training with Synchronized Audio, termed as VLSA, that can learn tri-modal representations in a unified self-supervised transformer. Specifically, our VLSA jointly aggregates embeddings of local patches and global tokens for video, text, and audio. Furthermore, we utilize local-patch masked modeling to learn modality-aware features, and leverage global audio matching to capture audio-guided features for video and text. We conduct extensive experiments on retrieval across text, video, and audio. Our simple model pre-trained on only 0.9M data achieves improving results against state-of-the-art baselines. In addition, qualitative visualizations vividly showcase the superiority of our VLSA in learning discriminative visual-textual representations.
- [273] arXiv:2405.07204 [pdf, ps, html, other]
-
Title: Transforming C++11 Code to C++03 to Support Legacy Compilation EnvironmentsSubjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
Newer technologies - programming languages, environments, libraries - change very rapidly. However, various internal and external constraints often prevent projects from quickly adopting to these changes. Customers may require specific platform compatibility from a software vendor, for example. In this work, we deal with such an issue in the context of the C++ programming language. Our industrial partner is required to use SDKs that support only older C++ language editions. They, however, would like to allow their developers to use the newest language constructs in their code. To address this problem, we created a source code transformation framework to automatically backport source code written according to the C++11 standard to its functionally equivalent C++03 variant. With our framework developers are free to exploit the latest language features, while production code is still built by using a restricted set of available language constructs. This paper reports on the technical details of the transformation engine, and our experiences in applying it on two large industrial code bases and four open-source systems. Our solution is freely available and open-source.
- [274] arXiv:2405.07206 [pdf, ps, html, other]
-
Title: Static JavaScript Call Graphs: A Comparative StudySubjects: Software Engineering (cs.SE)
The popularity and wide adoption of JavaScript both at the client and server side makes its code analysis more important than ever before. Most of the algorithms for vulnerability analysis, coding issue detection, or type inference rely on the call graph representation of the underlying program. Despite some obvious advantages of dynamic analysis, static algorithms should also be considered for call graph construction as they do not require extensive test beds for programs and their costly execution and tracing. In this paper, we systematically compare five widely adopted static algorithms - implemented by the npm call graph, IBM WALA, Google Closure Compiler, Approximate Call Graph, and Type Analyzer for JavaScript tools - for building JavaScript call graphs on 26 WebKit SunSpider benchmark programs and 6 real-world Node.js modules. We provide a performance analysis as well as a quantitative and qualitative evaluation of the results. We found that there was a relatively large intersection of the found call edges among the algorithms, which proved to be 100 precise. However, most of the tools found edges that were missed by all others. ACG had the highest precision followed immediately by TAJS, but ACG found significantly more call edges. As for the combination of tools, ACG and TAJS together covered 99% of the found true edges by all algorithms, while maintaining a precision as high as 98%. Only two of the tools were able to analyze up-to-date multi-file Node.js modules due to incomplete language features support. They agreed on almost 60% of the call edges, but each of them found valid edges that the other missed.
- [275] arXiv:2405.07210 [pdf, ps, html, other]
-
Title: A complete pair of solvents of a quadratic matrix pencilComments: 24 pages, 16 figuresSubjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS); Functional Analysis (math.FA); Spectral Theory (math.SP)
Let $B$ and $C$ be square complex matrices. The differential equation \begin{equation*} x''(t)+Bx'(t)+Cx(t)=f(t) \end{equation*} is considered. A solvent is a matrix solution $X$ of the equation $X^2+BX+C=\mathbf0$. A pair of solvents $X$ and $Z$ is called complete if the matrix $X-Z$ is invertible. Knowing a complete pair of solvents $X$ and $Z$ allows us to reduce the solution of the initial value problem to the calculation of two matrix exponentials $e^{Xt}$ and $e^{Zt}$. The problem of finding a complete pair $X$ and $Z$, which leads to small rounding errors in solving the differential equation, is discussed.
- [276] arXiv:2405.07212 [pdf, ps, html, other]
-
Title: Enhancing Decision-Making in Optimization through LLM-Assisted Inference: A Neural Networks PerspectiveComments: Accepted IJCNNSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
This paper explores the seamless integration of Generative AI (GenAI) and Evolutionary Algorithms (EAs) within the domain of large-scale multi-objective optimization. Focusing on the transformative role of Large Language Models (LLMs), our study investigates the potential of LLM-Assisted Inference to automate and enhance decision-making processes. Specifically, we highlight its effectiveness in illuminating key decision variables in evolutionarily optimized solutions while articulating contextual trade-offs. Tailored to address the challenges inherent in inferring complex multi-objective optimization solutions at scale, our approach emphasizes the adaptive nature of LLMs, allowing them to provide nuanced explanations and align their language with diverse stakeholder expertise levels and domain preferences. Empirical studies underscore the practical applicability and impact of LLM-Assisted Inference in real-world decision-making scenarios.
- [277] arXiv:2405.07213 [pdf, ps, html, other]
-
Title: Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript FunctionsSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.
- [278] arXiv:2405.07216 [pdf, ps, other]
-
Title: Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the StomachComments: IEEE ICRA 2024Subjects: Systems and Control (eess.SY)
Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (PDT) emerges as a promising prospect in this context. This study presents the development and implementation of a magnetically-guided origami robot, incorporating flexible printed circuit units for sustained and stable phototherapy of Helicobacter pylori. Each integrated unit is equipped with wireless charging capabilities, producing an optimal power output that can concurrently illuminate up to 15 LEDs at their maximum intensity. Crucially, these units can be remotely manipulated via a magnetic field, facilitating both translational and rotational movements. We propose an open-loop manual control sequence that allows the formation of a stable, compliant triangular structure through the interaction of internal magnets. This adaptable configuration is uniquely designed to withstand the dynamic squeezing environment prevalent in real-world gastric applications. The research herein represents a significant stride in leveraging technology for innovative medical solutions, particularly in the management of antibiotic-resistant Helicobacter pylori infections.
- [279] arXiv:2405.07220 [pdf, ps, html, other]
-
Title: On Discovery of Local Independence over Continuous Variables via Neural Contextual DecompositionComments: Conference on Causal Learning and Reasoning (CLeaR), 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Conditional independence provides a way to understand causal relationships among the variables of interest. An underlying system may exhibit more fine-grained causal relationships especially between a variable and its parents, which will be called the local independence relationships. One of the most widely studied local relationships is Context-Specific Independence (CSI), which holds in a specific assignment of conditioned variables. However, its applicability is often limited since it does not allow continuous variables: data conditioned to the specific value of a continuous variable contains few instances, if not none, making it infeasible to test independence. In this work, we define and characterize the local independence relationship that holds in a specific set of joint assignments of parental variables, which we call context-set specific independence (CSSI). We then provide a canonical representation of CSSI and prove its fundamental properties. Based on our theoretical findings, we cast the problem of discovering multiple CSSI relationships in a system as finding a partition of the joint outcome space. Finally, we propose a novel method, coined neural contextual decomposition (NCD), which learns such partition by imposing each set to induce CSSI via modeling a conditional distribution. We empirically demonstrate that the proposed method successfully discovers the ground truth local independence relationships in both synthetic dataset and complex system reflecting the real-world physical dynamics.
- [280] arXiv:2405.07223 [pdf, ps, html, other]
-
Title: Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement LearningComments: Accepted by Science China Information SciencesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In Reinforcement Learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble Q functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of Q functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.
- [281] arXiv:2405.07224 [pdf, ps, other]
-
Title: A geometric decomposition of finite games: Convergence vs. recurrence under no-regret learningComments: 50 pages, 16 figuresSubjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Optimization and Control (math.OC)
In view of the complexity of the dynamics of no-regret learning in games, we seek to decompose a finite game into simpler components where the day-to-day behavior of the dynamics is well understood. A natural starting point for this is Helmholtz's theorem, which resolves a vector field into a potential and an incompressible component. However, the geometry of no-regret dynamics - and, in particular, the dynamics of exponential / multiplicative weights (EW) schemes - is not compatible with the Euclidean underpinnings of Helmholtz's theorem, leading us to consider a Riemannian framework based on the Shahshahani metric. Using this geometric construction, we introduce the class of incompressible games, and we prove the following results: First, in addition to being volume-preserving, the continuous-time EW dynamics in incompressible games admit a constant of motion and are Poincaré recurrent - i.e., almost every trajectory of play comes arbitrarily close to its starting point infinitely often. Second, we establish a deep connection with a well-known decomposition of games into a potential and harmonic component (where the players' objectives are aligned and anti-aligned respectively): a game is incompressible if and only if it is harmonic, implying in turn that the EW dynamics lead to Poincaré recurrence in harmonic games.
- [282] arXiv:2405.07229 [pdf, ps, other]
-
Title: MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning TasksXiaocui Yang, Wenfang Wu, Shi Feng, Ming Wang, Daling Wang, Yang Li, Qi Sun, Yifei Zhang, Xiaoming Fu, Soujanya PoriaComments: Under review, the new version of MM-BigBench: arXiv:2310.09036Subjects: Multimedia (cs.MM)
The rising popularity of multimodal large language models (MLLMs) has sparked a significant increase in research dedicated to evaluating these models. However, current evaluation studies predominantly concentrate on the ability of models to comprehend and reason within a unimodal (vision-only) context, overlooking critical performance evaluations in complex multimodal reasoning tasks that integrate both visual and text contexts. Furthermore, tasks that demand reasoning across multiple modalities pose greater challenges and require a deep understanding of multimodal contexts. In this paper, we introduce a comprehensive assessment framework named MM-InstructEval, which integrates a diverse array of metrics to provide an extensive evaluation of the performance of various models and instructions across a broad range of multimodal reasoning tasks with vision-text contexts. MM-InstructEval enhances the research on the performance of MLLMs in complex multimodal reasoning tasks, facilitating a more thorough and holistic zero-shot evaluation of MLLMs. We firstly utilize the "Best Performance" metric to determine the upper performance limit of each model across various datasets. The "Mean Relative Gain" metric provides an analysis of the overall performance across different models and instructions, while the "Stability" metric evaluates their sensitivity to variations. Historically, the research has focused on evaluating models independently or solely assessing instructions, overlooking the interplay between models and instructions. To address this gap, we introduce the "Adaptability" metric, designed to quantify the degree of adaptability between models and instructions. Evaluations are conducted on 31 models (23 MLLMs) across 16 multimodal datasets, covering 6 tasks, with 10 distinct instructions. The extensive analysis enables us to derive novel insights.
- [283] arXiv:2405.07232 [pdf, ps, other]
-
Title: A Flow is a Stream of Packets: A Stream-Structured Data Approach for DDoS DetectionSubjects: Cryptography and Security (cs.CR)
Distributed Denial of Service (DDoS) attacks are getting increasingly harmful to the Internet, showing no signs of slowing down. Developing an accurate detection mechanism to thwart DDoS attacks is still a big challenge due to the rich variety of these attacks and the emergence of new attack vectors. In this paper, we propose a new tree-based DDoS detection approach that operates on a flow as a stream structure, rather than the traditional fixed-size record structure containing aggregated flow statistics. Although aggregated flow records have gained popularity over the past decade, providing an effective means for flow-based intrusion detection by inspecting only a fraction of the total traffic volume, they are inherently constrained. Their detection precision is limited not only by the lack of packet payloads, but also by their structure, which is unable to model fine-grained inter-packet relations, such as packet order and temporal relations. Additionally, inferring aggregated flow statistics must wait for the complete flow to end. Here we show that considering flow inputs as variable-length streams composed of their associated packet headers, allows for very accurate and fast detection of malicious flows. We evaluate our proposed strategy on the CICDDoS2019 and CICIDS2017 datasets, which contain a comprehensive variety of DDoS attacks. Our approach matches or exceeds existing machine learning techniques' accuracy, including state-of-the-art deep learning methods. Furthermore, our method achieves significantly earlier detection, e.g., with CICDDoS2019 detection based on the first 2 packets, which corresponds to an average time-saving of 99.79% and uses only 4--6% of the traffic volume.
- [284] arXiv:2405.07233 [pdf, ps, html, other]
-
Title: OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep LearningBin Lu, Ze Zhao, Luyu Han, Xiaoying Gan, Yuntao Zhou, Lei Zhou, Luoyi Fu, Xinbing Wang, Chenghu Zhou, Jing ZhangComments: Accepted to ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precise reconstruction. In this work, we propose OxyGenerator, the first deep learning based model, to reconstruct the global ocean deoxygenation from 1920 to 2023. Specifically, to address the heterogeneity across large temporal and spatial scales, we propose zoning-varying graph message-passing to capture the complex oceanographic correlations between missing values and sparse observations. Additionally, to further calibrate the uncertainty, we incorporate inductive bias from dissolved oxygen (DO) variations and chemical effects. Compared with in-situ DO observations, OxyGenerator significantly outperforms CMIP6 numerical simulations, reducing MAPE by 38.77%, demonstrating a promising potential to understand the "breathless ocean" in data-driven manner.
- [285] arXiv:2405.07236 [pdf, ps, html, other]
-
Title: Adaptive control of recurrent neural networks using conceptorsSubjects: Machine Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO)
Recurrent Neural Networks excel at predicting and generating complex high-dimensional temporal patterns. Due to their inherent nonlinear dynamics and memory, they can learn unbounded temporal dependencies from data. In a Machine Learning setting, the network's parameters are adapted during a training phase to match the requirements of a given task/problem increasing its computational capabilities. After the training, the network parameters are kept fixed to exploit the learned computations. The static parameters thereby render the network unadaptive to changing conditions, such as external or internal perturbation. In this manuscript, we demonstrate how keeping parts of the network adaptive even after the training enhances its functionality and robustness. Here, we utilize the conceptor framework and conceptualize an adaptive control loop analyzing the network's behavior continuously and adjusting its time-varying internal representation to follow a desired target. We demonstrate how the added adaptivity of the network supports the computational functionality in three distinct tasks: interpolation of temporal patterns, stabilization against partial network degradation, and robustness against input distortion. Our results highlight the potential of adaptive networks in machine learning beyond training, enabling them to not only learn complex patterns but also dynamically adjust to changing environments, ultimately broadening their applicability.
- [286] arXiv:2405.07237 [pdf, ps, html, other]
-
Title: Soft Contact Simulation and Manipulation Learning of Deformable Objects with Vision-based Tactile SensorJianhua Shan, Yuhao Sun, Shixin Zhang, Fuchun Sun, Zixi Chen, Zirong Shen, Cesare Stefanini, Yiyong Yang, Shan Luo, Bin FangSubjects: Robotics (cs.RO)
Deformable object manipulation is a classical and challenging research area in robotics. Compared with rigid object manipulation, this problem is more complex due to the deformation properties including elastic, plastic, and elastoplastic deformation. In this paper, we describe a new deformable object manipulation method including soft contact simulation, manipulation learning, and sim-to-real transfer. We propose a novel approach utilizing Vision-Based Tactile Sensors (VBTSs) as the end-effector in simulation to produce observations like relative position, squeezed area, and object contour, which are transferable to real robots. For a more realistic contact simulation, a new simulation environment including elastic, plastic, and elastoplastic deformations is created. We utilize RL strategies to train agents in the simulation, and expert demonstrations are applied for challenging tasks. Finally, we build a real experimental platform to complete the sim-to-real transfer and achieve a 90% success rate on difficult tasks such as cylinder and sphere. To test the robustness of our method, we use plasticine of different hardness and sizes to repeat the tasks including cylinder and sphere. The experimental results show superior performances of deformable object manipulation with the proposed method.
- [287] arXiv:2405.07241 [pdf, ps, other]
-
Title: Case Study of Novelty, Complexity, and Adaptation in a Multicellular SystemSubjects: Neural and Evolutionary Computing (cs.NE)
Continuing generation of novelty, complexity, and adaptation are well-established as core aspects of open-ended evolution. However, it has yet to be firmly established to what extent these phenomena are coupled and by what means they interact. In this work, we track the co-evolution of novelty, complexity, and adaptation in a case study from the DISHTINY simulation system, which is designed to study the evolution of digital multicellularity. In this case study, we describe ten qualitatively distinct multicellular morphologies, several of which exhibit asymmetrical growth and distinct life stages. We contextualize the evolutionary history of these morphologies with measurements of complexity and adaptation. Our case study suggests a loose -- sometimes divergent -- relationship can exist among novelty, complexity, and adaptation.
- [288] arXiv:2405.07244 [pdf, ps, other]
-
Title: Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation MetricsSubjects: Software Engineering (cs.SE)
Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2-10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.
- [289] arXiv:2405.07248 [pdf, ps, other]
-
Title: Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
The humanlike responses of large language models (LLMs) have prompted social scientists to investigate whether LLMs can be used to simulate human participants in experiments, opinion polls and surveys. Of central interest in this line of research has been mapping out the psychological profiles of LLMs by prompting them to respond to standardized questionnaires. The conflicting findings of this research are unsurprising given that mapping out underlying, or latent, traits from LLMs' text responses to questionnaires is no easy task. To address this, we use psychometrics, the science of psychological measurement. In this study, we prompt OpenAI's flagship models, GPT-3.5 and GPT-4, to assume different personas and respond to a range of standardized measures of personality constructs. We used two kinds of persona descriptions: either generic (four or five random person descriptions) or specific (mostly demographics of actual humans from a large-scale human dataset). We found that the responses from GPT-4, but not GPT-3.5, using generic persona descriptions show promising, albeit not perfect, psychometric properties, similar to human norms, but the data from both LLMs when using specific demographic profiles, show poor psychometrics properties. We conclude that, currently, when LLMs are asked to simulate silicon personas, their responses are poor signals of potentially underlying latent traits. Thus, our work casts doubt on LLMs' ability to simulate individual-level human behaviour across multiple-choice question answering tasks.
- [290] arXiv:2405.07250 [pdf, ps, other]
-
Title: Towards Cloud Efficiency with Large-scale Workload CharacterizationComments: 6 figures, 13 TablesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cloud providers introduce features (e.g., Spot VMs, Harvest VMs, and Burstable VMs) and optimizations (e.g., oversubscription, auto-scaling, power harvesting, and overclocking) to improve efficiency and reliability. To effectively utilize these features, it's crucial to understand the characteristics of workloads running in the cloud. However, workload characteristics can be complex and depend on multiple signals, making manual characterization difficult and unscalable. In this study, we conduct the first large-scale examination of first-party workloads at Microsoft to understand their characteristics. Through an empirical study, we aim to answer the following questions: (1) What are the critical workload characteristics that impact efficiency and reliability on cloud platforms? (2) How do these characteristics vary across different workloads? (3) How can cloud platforms leverage these insights to efficiently characterize all workloads at scale? This study provides a deeper understanding of workload characteristics and their impact on cloud performance, which can aid in optimizing cloud services. Additionally, it identifies potential areas for future research.
- [291] arXiv:2405.07252 [pdf, ps, other]
-
Title: Universal Batch Learning Under The Misspecification SettingSubjects: Machine Learning (cs.LG); Information Theory (cs.IT)
In this paper we consider the problem of universal {\em batch} learning in a misspecification setting with log-loss. In this setting the hypothesis class is a set of models $\Theta$. However, the data is generated by an unknown distribution that may not belong to this set but comes from a larger set of models $\Phi \supset \Theta$. Given a training sample, a universal learner is requested to predict a probability distribution for the next outcome and a log-loss is incurred. The universal learner performance is measured by the regret relative to the best hypothesis matching the data, chosen from $\Theta$. Utilizing the minimax theorem and information theoretical tools, we derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret. We show that this regret can be considered as a constrained version of the conditional capacity between the data and its generating distributions set. We present tight bounds for this min-max regret, implying that the complexity of the problem is dominated by the richness of the hypotheses models $\Theta$ and not by the data generating distributions set $\Phi$. We develop an extension to the Arimoto-Blahut algorithm for numerical evaluation of the regret and its capacity achieving prior distribution. We demonstrate our results for the case where the observations come from a $K$-parameters multinomial distributions while the hypothesis class $\Theta$ is only a subset of this family of distributions.
- [292] arXiv:2405.07257 [pdf, ps, other]
-
Title: Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head GenerationChangpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Chenghao He, Jing Xiao, Yuanxu Chen, Lei Dai, Feiyu ZhuSubjects: Computer Vision and Pattern Recognition (cs.CV)
Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and cannot be applied to arbitrary subjects. In this paper, we propose a one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from general Talking Face Generation by enabling emotional and postural control. Specifically, we introduce the Inter-Reconstructed Feature Disentanglement (IRFD) method to decouple human facial features into three latent spaces. We then design a face editing module that modifies speech content and facial latent codes into a single latent space. Subsequently, we present a novel generator that employs modified latent codes derived from the editing module to regulate emotional expression, head poses, and speech content in synthesizing facial animations. Extensive trials demonstrate that our method can generate realistic talking head with coordinated lip motions, authentic facial emotions, and smooth head movements. The demo video is available at the anonymous link: https://anonymous.4open.science/r/SPEAK-F56E
- [293] arXiv:2405.07259 [pdf, ps, other]
-
Title: CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling ToolComments: Available at this https URL. Published in ISPASS 2024Subjects: Hardware Architecture (cs.AR)
Compute-In-Memory (CiM) is a promising solution to accelerate Deep Neural Networks (DNNs) as it can avoid energy-intensive DNN weight movement and use memory arrays to perform low-energy, high-density computations. These benefits have inspired research across the CiM stack, but CiM research often focuses on only one level of the stack (i.e., devices, circuits, architecture, workload, or mapping) or only one design point (e.g., one fabricated chip). There is a need for a full-stack modeling tool to evaluate design decisions in the context of full systems (e.g., see how a circuit impacts system energy) and to perform rapid early-stage exploration of the CiM co-design space.
To address this need, we propose CiMLoop: an open-source tool to model diverse CiM systems and explore decisions across the CiM stack. CiMLoop introduces (1) a flexible specification that lets users describe, model, and map workloads to both circuits and architecture, (2) an accurate energy model that captures the interaction between DNN operand values, hardware data representations, and analog/digital values propagated by circuits, and (3) a fast statistical model that can explore the design space orders-of-magnitude more quickly than other high-accuracy models.
Using CiMLoop, researchers can evaluate design choices at different levels of the CiM stack, co-design across all levels, fairly compare different implementations, and rapidly explore the design space. - [294] arXiv:2405.07260 [pdf, ps, other]
-
Title: A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion RecognitionComments: 5 pages, 3 figures, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining self-supervised contrastive learning loss and supervised classification loss. This model optimizes both loss functions, capturing subtle EEG signal differences specific to emotion detection. Extensive experiments demonstrate SI-CLEER's robustness and superior accuracy on the SEED dataset compared to state-of-the-art methods. Furthermore, we analyze electrode performance, highlighting the significance of central frontal and temporal brain region EEGs in emotion detection. This study offers an universally applicable approach with potential benefits for diverse EEG classification tasks.
- [295] arXiv:2405.07263 [pdf, ps, other]
-
Title: Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase MiningSubjects: Computation and Language (cs.CL)
Dense vector representations for sentences made significant progress in recent years as can be seen on sentence similarity tasks. Real-world phrase retrieval applications, on the other hand, still encounter challenges for effective use of dense representations. We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector, is not sufficient for effective phrase retrieval. We therefore look into the notion of representing multiple, sub-sentence, consecutive word spans, each with its own dense vector. We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations. Accordingly, we make an argument for contextualized word/token embeddings that can be aggregated for arbitrary word spans while maintaining the span's semantic meaning. We introduce a modification to the common contrastive loss used for sentence embeddings that encourages word embeddings to have this property. To demonstrate the effect of this method we present a dataset based on the STS-B dataset with additional generated text, that requires finding the best matching paraphrase residing in a larger context and report the degree of similarity to the origin phrase. We demonstrate on this dataset, how our proposed method can achieve better results without significant increase to compute.
- [296] arXiv:2405.07264 [pdf, ps, other]
-
Title: Information Rates Over Multi-View ChannelsComments: 33 pages, 1 figure, submitted to the IEEESubjects: Information Theory (cs.IT)
We investigate the fundamental limits of reliable communication over multi-view channels, in which the channel output is comprised of a large number of independent noisy views of a transmitted symbol. We consider first the setting of multi-view discrete memoryless channels and then extend our results to general multi-view channels (using multi-letter formulas). We argue that the channel capacity and dispersion of such multi-view channels converge exponentially fast in the number of views to the entropy and varentropy of the input distribution, respectively. We identify the exact rate of convergence as the smallest Chernoff information between two conditional distributions of the output, conditioned on unequal inputs. For the special case of the deletion channel, we compute upper bounds on this Chernoff information. Finally, we present a new channel model we term the Poisson approximation channel -- of possible independent interest -- whose capacity closely approximates the capacity of the multi-view binary symmetric channel for any fixed number of views.
- [297] arXiv:2405.07265 [pdf, ps, other]
-
Title: An Approach for Decentralized Authentication in Networks of UAVsComments: 5 pagesJournal-ref: Proc of the 12th International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2021), Porto Portugal, April 2021, pp. 13-17, ISSN 2308-4294Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
We propose a decentralized authentication system for networks of unmanned aerial vehicles. A blockchain-based public key infrastructure allows the usage of public key cryptography and public key based authentication protocols. The blockchain provides a common storage of the public keys and their relations and can provide the required information for the authentication process. Furthermore, the unmanned aerial vehicles store selected parts of the blockchain in order to operate independently in areas where they might not have access to the Internet. This allows unmanned aerial vehicles to authenticate entities of the network, like other unmanned aerial vehicles, cloud services, cars, and any computer.
- [298] arXiv:2405.07266 [pdf, ps, other]
-
Title: Architecture-Level Modeling of Photonic Deep Neural Network AcceleratorsComments: Published at ISPASS 2024Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR)
Photonics is a promising technology to accelerate Deep Neural Networks as it can use optical interconnects to reduce data movement energy and it enables low-energy, high-throughput optical-analog computations.
To realize these benefits in a full system (accelerator + DRAM), designers must ensure that the benefits of using the electrical, optical, analog, and digital domains exceed the costs of converting data between domains. Designers must also consider system-level energy costs such as data fetch from DRAM. Converting data and accessing DRAM can consume significant energy, so to evaluate and explore the photonic system space, there is a need for a tool that can model these full-system considerations.
In this work, we show that similarities between Compute-in-Memory (CiM) and photonics let us use CiM system modeling tools to accurately model photonics systems. Bringing modeling tools to photonics enables evaluation of photonic research in a full-system context, rapid design space exploration, co-design, and comparison between systems.
Using our open-source model, we show that cross-domain conversion and DRAM can consume a significant portion of photonic system energy. We then demonstrate optimizations that reduce conversions and DRAM accesses to improve photonic system energy efficiency by up to 3x. - [299] arXiv:2405.07267 [pdf, ps, other]
-
Title: Fields, Bridges, and Foundations: How Researchers Browse Citation Network VisualizationsSubjects: Human-Computer Interaction (cs.HC)
Visualizing citation relations with network structures is widely used, but the visual complexity can make it challenging for individual researchers to navigate through them. We collected data from 18 researchers using an interface that we designed using network simplification methods and analyzed how users browsed and identified important papers. Our analysis reveals six major patterns used for identifying papers of interest, which can be categorized into three key components: Fields, Bridges, and Foundations, each viewed from two distinct perspectives: layout-oriented and connection-oriented. The connection-oriented approach was found to be more effective for selecting relevant papers, but the layout-oriented method was adopted more often, even though it led to unexpected results and user frustration. Our findings emphasize the importance of integrating these components and the necessity to balance visual layouts with meaningful connections to enhance the effectiveness of citation networks in academic browsing systems.
- [300] arXiv:2405.07272 [pdf, ps, other]
-
Title: MAML MOT: Multiple Object Tracking based on Meta-LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
With the advancement of video analysis technology, the multi-object tracking (MOT) problem in complex scenes involving pedestrians is gaining increasing importance. This challenge primarily involves two key tasks: pedestrian detection and re-identification. While significant progress has been achieved in pedestrian detection tasks in recent years, enhancing the effectiveness of re-identification tasks remains a persistent challenge. This difficulty arises from the large total number of pedestrian samples in multi-object tracking datasets and the scarcity of individual instance samples. Motivated by recent rapid advancements in meta-learning techniques, we introduce MAML MOT, a meta-learning-based training approach for multi-object tracking. This approach leverages the rapid learning capability of meta-learning to tackle the issue of sample scarcity in pedestrian re-identification tasks, aiming to improve the model's generalization performance and robustness. Experimental results demonstrate that the proposed method achieves high accuracy on mainstream datasets in the MOT Challenge. This offers new perspectives and solutions for research in the field of pedestrian multi-object tracking.
- [301] arXiv:2405.07274 [pdf, ps, other]
-
Title: Timely Offloading in Mobile Edge Cloud SystemsSubjects: Systems and Control (eess.SY)
Future real-time applications like smart cities will use complex Machine Learning (ML) models for a variety of tasks. Timely status information is required for these applications to be reliable. Offloading computation to a mobile edge cloud (MEC) can reduce the completion time of these tasks. However, using the MEC may come at a cost such as related to use of a cloud service or privacy. In this paper, we consider a source that generates time-stamped status updates for delivery to a monitor after processing by the mobile device or MEC. We study how a scheduler must forward these updates to achieve timely updates at the monitor but also limit MEC usage. We measure timeliness at the monitor using the age of information (AoI) metric. We formulate this problem as an infinite horizon Markov decision process (MDP) with an average cost criterion. We prove that an optimal scheduling policy has an age-threshold structure that depends on how long an update has been in service.
- [302] arXiv:2405.07275 [pdf, ps, other]
-
Title: Distribution-Preserving Integrated Sensing and Communication with Secure ReconstructionComments: Accepted by ISIT2024Subjects: Information Theory (cs.IT)
Distribution-preserving integrated sensing and communication with secure reconstruction is investigated in this paper. In addition to the distortion constraint, we impose another constraint on the distance between the reconstructed sequence distribution and the original state distribution to force the system to preserve the statistical property of the channel states. An inner bound of the distribution-preserving capacity-distortion region is provided with some capacity region results under special cases. A numerical example demonstrates the tradeoff between the communication rate, reconstruction distortion and distribution preservation. Furthermore, we consider the case that the reconstructed sequence should be kept secret from an eavesdropper who also observes the channel output. An inner bound of the tradeoff region and a capacity-achieving special case are presented.
- [303] arXiv:2405.07277 [pdf, ps, other]
-
Title: Mining Influential Spreaders in Complex Networks by an Effective Combination of the Degree and K-ShellComments: 6 page, In 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 1-6. IEEE, 2024Subjects: Social and Information Networks (cs.SI)
Graph mining is an important technique that used in many applications such as predicting and understanding behaviors and information dissemination within networks. One crucial aspect of graph mining is the identification and ranking of influential nodes, which has applications in various fields including marketing, social communications, and disease control. However, existing models and methods come with high computational complexity and may not accurately distinguish and identify influential nodes. This paper develops a method based on the k-shell index and degree centrality of nodes and their neighbors. Comparisons to previous works, such as Degree and Neighborhood information Centrality (DNC) and Neighborhood and Path Information Centrality (NPIC), are conducted. The evaluations, which include the correctness with Kendall's Tau, resolution with monotonicity index, correlation plots, and time complexity, demonstrate its superior results.
- [304] arXiv:2405.07278 [pdf, ps, other]
-
Title: Human-interpretable clustering of short-text using large language modelsComments: Main text: 18 pages, 8 figures. Supplementary: 21 pages, 15 figures, 3 tablesSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Large language models have seen extraordinary growth in popularity due to their human-like content generation capabilities. We show that these models can also be used to successfully cluster human-generated content, with success defined through the measures of distinctiveness and interpretability. This success is validated by both human reviewers and ChatGPT, providing an automated means to close the 'validation gap' that has challenged short-text clustering. Comparing the machine and human approaches we identify the biases inherent in each, and question the reliance on human-coding as the 'gold standard'. We apply our methodology to Twitter bios and find characteristic ways humans describe themselves, agreeing well with prior specialist work, but with interesting differences characteristic of the medium used to express identity.
- [305] arXiv:2405.07280 [pdf, ps, html, other]
-
Title: Humor Mechanics: Advancing Humor Generation with Multistep ReasoningComments: ICCC 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
In this paper, we explore the generation of one-liner jokes through multi-step reasoning. Our work involved reconstructing the process behind creating humorous one-liners and developing a working prototype for humor generation. We conducted comprehensive experiments with human participants to evaluate our approach, comparing it with human-created jokes, zero-shot GPT-4 generated humor, and other baselines. The evaluation focused on the quality of humor produced, using human labeling as a benchmark. Our findings demonstrate that the multi-step reasoning approach consistently improves the quality of generated humor. We present the results and share the datasets used in our experiments, offering insights into enhancing humor generation with artificial intelligence.
- [306] arXiv:2405.07282 [pdf, ps, other]
-
Title: Branching Narratives: Character Decision Points DetectionComments: GamesAndNLP @ LREC COLING 2024Subjects: Computation and Language (cs.CL)
This paper presents the Character Decision Points Detection (CHADPOD) task, a task of identification of points within narratives where characters make decisions that may significantly influence the story's direction. We propose a novel dataset based on CYOA-like games graphs to be used as a benchmark for such a task. We provide a comparative analysis of different models' performance on this task, including a couple of LLMs and several MLMs as baselines, achieving up to 89% accuracy. This underscores the complexity of narrative analysis, showing the challenges associated with understanding character-driven story dynamics. Additionally, we show how such a model can be applied to the existing text to produce linear segments divided by potential branching points, demonstrating the practical application of our findings in narrative analysis.
- [307] arXiv:2405.07283 [pdf, ps, other]
-
Title: BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global MapsComments: The first two authors are co-first authors. 8 pages, accepted by RA-LSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired ghost tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present BeautyMap to efficiently remove the dynamic points while retaining static features for high-fidelity global maps. Our approach utilizes a binary-encoded matrix to efficiently extract the environment features. With a bit-wise comparison between matrices of each frame and the corresponding map region, we can extract potential dynamic regions. Then we use coarse to fine hierarchical segmentation of the $z$-axis to handle terrain variations. The final static restoration module accounts for the range-visibility of each single scan and protects static points out of sight. Comparative experiments underscore BeautyMap's superior performance in both accuracy and efficiency against other dynamic points removal methods. The code is open-sourced at this https URL.
- [308] arXiv:2405.07284 [pdf, ps, other]
-
Title: Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)Comments: 5 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune CLIP on a Pokemon dataset, allowing it to learn meaningful image-text representations. SLIP demonstrates the ability to recognize and segment objects in images based on contextual information from text prompts, expanding the capabilities of SAM for versatile object segmentation. Our experiments demonstrate the effectiveness of the SLIP architecture in segmenting objects in images based on textual cues. The integration of CLIP's text-image understanding capabilities into SAM expands the capabilities of the original architecture and enables more versatile and context-aware object segmentation.
- [309] arXiv:2405.07288 [pdf, ps, other]
-
Title: Erasing Concepts from Text-to-Image Diffusion Models with Few-shot UnlearningComments: 23 pages, 28 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Generating images from text has become easier because of the scaling of diffusion models and advancements in the field of vision and language. These models are trained using vast amounts of data from the Internet. Hence, they often contain undesirable content such as copyrighted material. As it is challenging to remove such data and retrain the models, methods for erasing specific concepts from pre-trained models have been investigated. We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning in which a few real images are used. The discussion regarding the generated images after erasing a concept has been lacking. While there are methods for specifying the transition destination for concepts, the validity of the specified concepts is unclear. Our method implicitly achieves this by transitioning to the latent concepts inherent in the model or the images. Our method can erase a concept within 10 s, making concept erasure more accessible than ever before. Implicitly transitioning to related concepts leads to more natural concept erasure. We applied the proposed method to various concepts and confirmed that concept erasure can be achieved tens to hundreds of times faster than with current methods. By varying the parameters to be updated, we obtained results suggesting that, like previous research, knowledge is primarily accumulated in the feed-forward networks of the text encoder.
- [310] arXiv:2405.07291 [pdf, ps, other]
-
Title: Robust Beamforming with Gradient-based Liquid Neural NetworkXinquan Wang, Fenghao Zhu, Chongwen Huang, Ahmed Alhammadi, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Merouane DebbahSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Millimeter-wave (mmWave) multiple-input multiple-output (MIMO) communication with the advanced beamforming technologies is a key enabler to meet the growing demands of future mobile communication. However, the dynamic nature of cellular channels in large-scale urban mmWave MIMO communication scenarios brings substantial challenges, particularly in terms of complexity and robustness. To address these issues, we propose a robust gradient-based liquid neural network (GLNN) framework that utilizes ordinary differential equation-based liquid neurons to solve the beamforming problem. Specifically, our proposed GLNN framework takes gradients of the optimization objective function as inputs to extract the high-order channel feature information, and then introduces a residual connection to mitigate the training burden. Furthermore, we use the manifold learning technique to compress the search space of the beamforming problem. These designs enable the GLNN to effectively maintain low complexity while ensuring strong robustness to noisy and highly dynamic channels. Extensive simulation results demonstrate that the GLNN can achieve 4.15% higher spectral efficiency than that of typical iterative algorithms, and reduce the time consumption to only 1.61% that of conventional methods.
- [311] arXiv:2405.07293 [pdf, ps, other]
-
Title: Sparse Sampling is All You Need for Fast Wrong-way Cycling Detection in CCTV VideosSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
In the field of transportation, it is of paramount importance to address and mitigate illegal actions committed by both motor and non-motor vehicles. Among those actions, wrong-way cycling (i.e., riding a bicycle or e-bike in the opposite direction of the designated traffic flow) poses significant risks to both cyclists and other road users. To this end, this paper formulates a problem of detecting wrong-way cycling ratios in CCTV videos. Specifically, we propose a sparse sampling method called WWC-Predictor to efficiently solve this problem, addressing the inefficiencies of direct tracking methods. Our approach leverages both detection-based information, which utilizes the information from bounding boxes, and orientation-based information, which provides insights into the image itself, to enhance instantaneous information capture capability. On our proposed benchmark dataset consisting of 35 minutes of video sequences and minute-level annotation, our method achieves an average error rate of a mere 1.475% while taking only 19.12% GPU time of straightforward tracking methods under the same detection model. This remarkable performance demonstrates the effectiveness of our approach in identifying and predicting instances of wrong-way cycling.
- [312] arXiv:2405.07305 [pdf, ps, other]
-
Title: Finding a Way Through the Social Media Labyrinth: Guiding Design Through User ExpectationsComments: This is the author-version of this work. The paper is submitted as a draft, as it is still under review, but was published at arXiv in support of a cumulative dissertationSubjects: Human-Computer Interaction (cs.HC)
Social networking services (SNS) have become integral to modern life to create and maintain meaningful relationships. Nevertheless, their historic growth of features has led to labyrinthine user interfaces (UIs) that often result in frustration among users - for instance, when trying to control privacy-related settings. This paper aims to mitigate labyrinthine UIs by studying users' expectations (N=21) through an online card sorting exercise based on 58 common SNS UI features, teaching us about their expectations regarding the importance of specific UI features and the frequency with which they use them. Our findings offer a valuable understanding of the relationship between the importance and frequency of UI features and provide design considerations for six identified UI feature groups. Through these findings, we inform the design and development of user-centred alternatives to current SNS interfaces that enable users to successfully navigate SNS and feel in control over their data by meeting their expectations.
- [313] arXiv:2405.07306 [pdf, ps, other]
-
Title: Point Resampling and Ray Transformation Aid to Editable NeRF ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
In NeRF-aided editing tasks, object movement presents difficulties in supervision generation due to the introduction of variability in object positions. Moreover, the removal operations of certain scene objects often lead to empty regions, presenting challenges for NeRF models in inpainting them effectively. We propose an implicit ray transformation strategy, allowing for direct manipulation of the 3D object's pose by operating on the neural-point in NeRF rays. To address the challenge of inpainting potential empty regions, we present a plug-and-play inpainting module, dubbed differentiable neural-point resampling (DNR), which interpolates those regions in 3D space at the original ray locations within the implicit space, thereby facilitating object removal & scene inpainting tasks. Importantly, employing DNR effectively narrows the gap between ground truth and predicted implicit features, potentially increasing the mutual information (MI) of the features across rays. Then, we leverage DNR and ray transformation to construct a point-based editable NeRF pipeline PR^2T-NeRF. Results primarily evaluated on 3D object removal & inpainting tasks indicate that our pipeline achieves state-of-the-art performance. In addition, our pipeline supports high-quality rendering visualization for diverse editing operations without necessitating extra supervision.
- [314] arXiv:2405.07309 [pdf, ps, other]
-
Title: DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language ModelSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Generating robot demonstrations through simulation is widely recognized as an effective way to scale up robot data. Previous work often trained reinforcement learning agents to generate expert policies, but this approach lacks sample efficiency. Recently, a line of work has attempted to generate robot demonstrations via differentiable simulation, which is promising but heavily relies on reward design, a labor-intensive process. In this paper, we propose DiffGen, a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model to enable automatic and efficient generation of robot demonstrations. Given a simulated robot manipulation scenario and a natural language instruction, DiffGen can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation after manipulation. The embeddings are obtained from the vision-language model, and the optimization is achieved by calculating and descending gradients through the differentiable simulation, differentiable rendering, and vision-language model components, thereby accomplishing the specified task. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
- [315] arXiv:2405.07310 [pdf, ps, other]
-
Title: Machine Learning-Based Protection and Fault Identification of 100% Inverter-Based MicrogridsComments: Accepted for publication at 2024 IEEE 33rd International Symposium on Industrial Electronics (ISIE)Subjects: Systems and Control (eess.SY)
100% inverter-based renewable units are becoming more prevalent, introducing new challenges in the protection of microgrids that incorporate these resources. This is particularly due to low fault currents and bidirectional flows. Previous work has studied the protection of microgrids with high penetration of inverter-interfaced distributed generators; however, very few have studied the protection of a 100% inverter-based microgrid. This work proposes machine learning (ML)-based protection solutions using local electrical measurements that consider implementation challenges and effectively combine short-circuit fault detection and type identification. A decision tree method is used to analyze a wide range of fault scenarios. PSCAD/EMTDC simulation environment is used to create a dataset for training and testing the proposed method. The effectiveness of the proposed methods is examined under seven distinct fault types, each featuring varying fault resistance, in a 100% inverter-based microgrid consisting of four inverters.
- [316] arXiv:2405.07311 [pdf, ps, other]
-
Title: A Compact Delay Model for OTS DevicesSubjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
This paper presents a novel compact delay model of Ovonic Threshold Switch (OTS) devices that works efficiently for circuit simulations. The internal state variable of the two terminal devices is estimated using a delay system that uses a few electrical components related to a suggested equivalent circuit of the device. Finally, we tested the proposed model against measured data from devices fabricated by Western Digital Research.
- [317] arXiv:2405.07312 [pdf, ps, other]
-
Title: Nonparametric Control-Koopman Operator Learning: Flexible and Scalable Models for Prediction and ControlSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Linearity of Koopman operators and simplicity of their estimators coupled with model-reduction capabilities has lead to their great popularity in applications for learning dynamical systems. While nonparametric Koopman operator learning in infinite-dimensional reproducing kernel Hilbert spaces is well understood for autonomous systems, its control system analogues are largely unexplored. Addressing systems with control inputs in a principled manner is crucial for fully data-driven learning of controllers, especially since existing approaches commonly resort to representational heuristics or parametric models of limited expressiveness and scalability. We address the aforementioned challenge by proposing a universal framework via control-affine reproducing kernels that enables direct estimation of a single operator even for control systems. The proposed approach, called control-Koopman operator regression (cKOR), is thus completely analogous to Koopman operator regression of the autonomous case. First in the literature, we present a nonparametric framework for learning Koopman operator representations of nonlinear control-affine systems that does not suffer from the curse of control input dimensionality. This allows for reformulating the infinite-dimensional learning problem in a finite-dimensional space based solely on data without apriori loss of precision due to a restriction to a finite span of functions or inputs as in other approaches. For enabling applications to large-scale control systems, we also enhance the scalability of control-Koopman operator estimators by leveraging random projections (sketching). The efficacy of our novel cKOR approach is demonstrated on both forecasting and control tasks.
- [318] arXiv:2405.07314 [pdf, ps, other]
-
Title: Learnable Tokenizer for LLM-based Generative RecommendationSubjects: Information Retrieval (cs.IR)
Harnessing Large Language Models (LLMs) for generative recommendation has garnered significant attention due to LLMs' powerful capacities such as rich world knowledge and reasoning. However, a critical challenge lies in transforming recommendation data into the language space of LLMs through effective item tokenization. Existing approaches, such as ID identifiers, textual identifiers, and codebook-based identifiers, exhibit limitations in encoding semantic information, incorporating collaborative signals, or handling code assignment bias. To address these shortcomings, we propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), designed to meet the key criteria of identifiers by integrating hierarchical semantics, collaborative signals, and code assignment diversity. LETTER integrates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias. We instantiate LETTER within two generative recommender models and introduce a ranking-guided generation loss to enhance their ranking ability. Extensive experiments across three datasets demonstrate the superiority of LETTER in item tokenization, thereby advancing the state-of-the-art in the field of generative recommendation.
- [319] arXiv:2405.07316 [pdf, ps, other]
-
Title: VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial PresenceMayank Bakshi, Sara Ghasvarianjahromi, Yauhen Yakimenka, Allison Beemer, Oliver Kosut, Joerg KliewerComments: This is an extended version of the paper at International Symposium on Information Theory 2024Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data and possible adversarial infiltration. We require (a) convergence to a global empirical loss minimizer when adversaries are absent, and (b) either detection of adversarial presence of convergence to an admissible consensus irrespective of the adversarial configuration. To this end, we propose the VALID protocol which, to the best of our knowledge, is the first to achieve a validated learning guarantee. Moreover, VALID offers an O(1/T) convergence rate (under pertinent regularity assumptions), and computational and communication complexities comparable to non-adversarial distributed stochastic gradient descent. Remarkably, VALID retains optimal performance metrics in adversary-free environments, sidestepping the robustness penalties observed in prior byzantine-robust methods. A distinctive aspect of our study is a heterogeneity metric based on the norms of individual agents' gradients computed at the global empirical loss minimizer. This not only provides a natural statistic for detecting significant byzantine disruptions but also allows us to prove the optimality of VALID in wide generality. Lastly, our numerical results reveal that, in the absence of adversaries, VALID converges faster than state-of-the-art byzantine robust algorithms, while when adversaries are present, VALID terminates with each honest either converging to an admissible consensus of declaring adversarial presence in the network.
- [320] arXiv:2405.07317 [pdf, ps, other]
-
Title: Machine Unlearning in Contrastive LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Machine unlearning is a complex process that necessitates the model to diminish the influence of the training data while keeping the loss of accuracy to a minimum. Despite the numerous studies on machine unlearning in recent years, the majority of them have primarily focused on supervised learning models, leaving research on contrastive learning models relatively underexplored. With the conviction that self-supervised learning harbors a promising potential, surpassing or rivaling that of supervised learning, we set out to investigate methods for machine unlearning centered around contrastive learning models. In this study, we introduce a novel gradient constraint-based approach for training the model to effectively achieve machine unlearning. Our method only necessitates a minimal number of training epochs and the identification of the data slated for unlearning. Remarkably, our approach demonstrates proficient performance not only on contrastive learning models but also on supervised learning models, showcasing its versatility and adaptability in various learning paradigms.
- [321] arXiv:2405.07319 [pdf, ps, other]
-
Title: LayGA: Layered Gaussian Avatars for Animatable Clothing TransferComments: SIGGRAPH 2024 conference trackSubjects: Computer Vision and Pattern Recognition (cs.CV)
Animatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these limitations, we present Layered Gaussian Avatars (LayGA), a new representation that formulates body and clothing as two separate layers for photorealistic animatable clothing transfer from multi-view videos. Our representation is built upon the Gaussian map-based avatar for its excellent representation power of garment details. However, the Gaussian map produces unstructured 3D Gaussians distributed around the actual surface. The absence of a smooth explicit surface raises challenges in accurate garment tracking and collision handling between body and garments. Therefore, we propose two-stage training involving single-layer reconstruction and multi-layer fitting. In the single-layer reconstruction stage, we propose a series of geometric constraints to reconstruct smooth surfaces and simultaneously obtain the segmentation between body and clothing. Next, in the multi-layer fitting stage, we train two separate models to represent body and clothing and utilize the reconstructed clothing geometries as 3D supervision for more accurate garment tracking. Furthermore, we propose geometry and rendering layers for both high-quality geometric reconstruction and high-fidelity rendering. Overall, the proposed LayGA realizes photorealistic animations and virtual try-on, and outperforms other baseline methods. Our project page is this https URL.
- [322] arXiv:2405.07320 [pdf, ps, other]
-
Title: L(u)PIN: LLM-based Political Ideology NowcastingSubjects: Computation and Language (cs.CL)
The quantitative analysis of political ideological positions is a difficult task. In the past, various literature focused on parliamentary voting data of politicians, party manifestos and parliamentary speech to estimate political disagreement and polarization in various political systems. However previous methods of quantitative political analysis suffered from a common challenge which was the amount of data available for analysis. Also previous methods frequently focused on a more general analysis of politics such as overall polarization of the parliament or party-wide political ideological positions. In this paper, we present a method to analyze ideological positions of individual parliamentary representatives by leveraging the latent knowledge of LLMs. The method allows us to evaluate the stance of politicians on an axis of our choice allowing us to flexibly measure the stance of politicians in regards to a topic/controversy of our choice. We achieve this by using a fine-tuned BERT classifier to extract the opinion-based sentences from the speeches of representatives and projecting the average BERT embeddings for each representative on a pair of reference seeds. These reference seeds are either manually chosen representatives known to have opposing views on a particular topic or they are generated sentences which where created using the GPT-4 model of OpenAI. We created the sentences by prompting the GPT-4 model to generate a speech that would come from a politician defending a particular position.
- [323] arXiv:2405.07324 [pdf, ps, other]
-
Title: QACM: QoS-Aware xApp Conflict Mitigation in Open RANSubjects: Networking and Internet Architecture (cs.NI)
The advent of Open Radio Access Network (RAN) has revolutionized the field of RAN by introducing elements of native support of intelligence and openness into the next generation of mobile network infrastructure. Open RAN paves the way for standardized interfaces and enables the integration of network applications from diverse vendors, thereby enhancing network management flexibility. However, control decision conflicts occur when components from different vendors are deployed together. This article provides an overview of various types of conflicts that may occur in Open RAN, with a particular focus on intra-component conflict mitigation among Extended Applications (xApps) in the Near Real Time RAN Intelligent Controller (Near-RT-RIC). A QoS-Aware Conflict Mitigation (QACM) method is proposed that finds the optimal configuration of conflicting parameters while maximizing the number of xApps that have their Quality of Service (QoS) requirements met. We compare the performance of the proposed QACM method with two benchmark methods for priority and non-priority cases. The results indicate that our proposed method is the most effective in maintaining QoS requirements for conflicting xApps.
- [324] arXiv:2405.07326 [pdf, ps, other]
-
Title: Power Evaluation of IOT Application Layer ProtocolsSubjects: Networking and Internet Architecture (cs.NI)
The Internet of Things has affected all aspects of daily life, and the number of IoT devices is increasing day by day. According to forecasts, the number of Internet of Things devices will reach one trillion devices by 2035. The increase in the number of devices connected to the Internet will cause various concerns. One of the most important concerns is the energy and power consumption of these devices. Although Internet of Things modules are low in energy consumption, their widespread and large-scale use has made the issue of power consumption become the most important challenge in this field. For this reason, it is necessary to use communication protocols that, in addition to establishing efficient communication, impose minimal power consumption on the network. In this paper, application layer protocols such as MQTT, MQTT-SN, CoAP, and HTTP are simulated using the tools available in the Contiki operating system, including COOJA and Powertrace, and they { are evaluated} and compared with each other in terms of power consumption. According to the simulations performed by the mentioned tools, the MQTT-SN protocol was the least consuming protocol in terms of power consumption. After that, the CoAP protocol is placed, and with a slight difference, the MQTT protocol, which consumes more than MQTT-SN. Finally, the HTTP protocol consumes the most power, which makes it unsuitable for communication in the Internet of Things
- [325] arXiv:2405.07327 [pdf, ps, html, other]
-
Title: Liquid Ensemble Selection for Continual LearningComments: Accepted at Canadian AI Conference 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Continual learning aims to enable machine learning models to continually learn from a shifting data distribution without forgetting what has already been learned. Such shifting distributions can be broken into disjoint subsets of related examples; by training each member of an ensemble on a different subset it is possible for the ensemble as a whole to achieve much higher accuracy with less forgetting than a naive model. We address the problem of selecting which models within an ensemble should learn on any given data, and which should predict. By drawing on work from delegative voting we develop an algorithm for using delegation to dynamically select which models in an ensemble are active. We explore a variety of delegation methods and performance metrics, ultimately finding that delegation is able to provide a significant performance boost over naive learning in the face of distribution shifts.
- [326] arXiv:2405.07331 [pdf, ps, other]
-
Title: Stochastic Bandits with ReLU Neural NetworksSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.
- [327] arXiv:2405.07332 [pdf, ps, other]
-
Title: PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and ClassificationMohammad Shafiul Alam, Fatema Tuj Johora Faria, Mukaffi Bin Moin, Ahmed Al Wase, Md. Rabius Sani, Khan Md HasibSubjects: Computer Vision and Pattern Recognition (cs.CV)
Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural economy to quickly and properly identify these diseases. Traditional data augmentation approaches, such as rotation, flip, and translation, have limitations and frequently fail to provide strong generalization results. To address these issues, our research employs a novel approach termed as PotatoGANs. In this novel data augmentation approach, two types of Generative Adversarial Networks (GANs) are utilized to generate synthetic potato disease images from healthy potato images. This approach not only expands the dataset but also adds variety, which helps to enhance model generalization. Using the Inception score as a measure, our experiments show the better quality and realisticness of the images created by PotatoGANs, emphasizing their capacity to resemble real disease images closely. The CycleGAN model outperforms the Pix2Pix GAN model in terms of image quality, as evidenced by its higher IS scores CycleGAN achieves higher Inception scores (IS) of 1.2001 and 1.0900 for black scurf and common scab, respectively. This synthetic data can significantly improve the training of large neural networks. It also reduces data collection costs while enhancing data diversity and generalization capabilities. Our work improves interpretability by combining three gradient-based Explainable AI algorithms (GradCAM, GradCAM++, and ScoreCAM) with three distinct CNN architectures (DenseNet169, Resnet152 V2, InceptionResNet V2) for potato disease classification.
- [328] arXiv:2405.07335 [pdf, ps, other]
-
Title: Tremor Reduction for Accessible Ray Based Interaction in VR ApplicationsComments: The pre-print contains 7 pages, 5 figures and 4 tables. The attached pre-print is an extract containing some information about the completed study results, the full paper is in review at the appropriate journal. This pre-print is released to support developers implementing tremor reduction solutions for VR now as its been in the review process for yearsSubjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Comparative to conventional 2D interaction methods, virtual reality (VR) demonstrates an opportunity for unique interface and interaction design decisions. Currently, this poses a challenge when developing an accessible VR experience as existing interaction techniques may not be usable by all users. It was discovered that many traditional 2D interface interaction methods have been directly converted to work in a VR space with little alteration to the input mechanism, such as the use of a laser pointer designed to that of a traditional cursor. It is recognized that distanceindependent millimetres can support designers in developing interfaces that scale in virtual worlds. Relevantly, Fitts law states that as distance increases, user movements are increasingly slower and performed less accurately. In this paper we propose the use of a low pass filter, to normalize user input noise, alleviating fine motor requirements during ray-based interaction. A development study was conducted to understand the feasibility of implementing such a filter and explore its effects on end users experience. It demonstrates how an algorithm can provide an opportunity for a more accurate and consequently less frustrating experience by filtering and reducing involuntary hand tremors. Further discussion on existing VR design philosophies is also conducted, analysing evidence that supports multisensory feedback and psychological models. The completed study can be downloaded from GitHub.
- [329] arXiv:2405.07336 [pdf, ps, other]
-
Title: Data Trading Combination Auction Mechanism based on the Exponential MechanismSubjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
With the widespread application of machine learning technology in recent years, the demand for training data has increased significantly, leading to the emergence of research areas such as data trading. The work in this field is still in the developmental stage. Different buyers have varying degrees of demand for various types of data, and auctions play a role in such scenarios due to their authenticity and fairness. Recent related work has proposed combination auction mechanisms for different domains. However, such mechanisms have not addressed the privacy concerns of buyers. In this paper, we design a \textit{Data Trading Combination Auction Mechanism based on the exponential mechanism} (DCAE) to protect buyers' bidding privacy from being leaked. We apply the exponential mechanism to select the final settlement price for the auction and generate a probability distribution based on the relationship between the price and the revenue. In the experimental aspect, we consider the selection of different mechanisms under two scenarios, and the experimental results show that this method can ensure high auction revenue and protect buyers' privacy from being violated.
- [330] arXiv:2405.07340 [pdf, ps, other]
-
Title: Machine Consciousness as Pseudoscience: The Myth of Conscious MachinesComments: 19 pagesSubjects: Computers and Society (cs.CY)
The hypothesis of conscious machines has been debated since the invention of the notion of artificial intelligence, powered by the assumption that the computational intelligence achieved by a system is the cause of the emergence of phenomenal consciousness in that system as an epiphenomenon or as a consequence of the behavioral or internal complexity of the system surpassing some threshold. As a consequence, a huge amount of literature exploring the possibility of machine consciousness and how to implement it on a computer has been published. Moreover, common folk psychology and transhumanism literature has fed this hypothesis with the popularity of science fiction literature, where intelligent robots are usually antropomorphized and hence given phenomenal consciousness. However, in this work, we argue how these literature lacks scientific rigour, being impossible to falsify the opposite hypothesis, and illustrate a list of arguments that show how every approach that the machine consciousness literature has published depends on philosophical assumptions that cannot be proven by the scientific method. Concretely, we also show how phenomenal consciousness is not computable, independently on the complexity of the algorithm or model, cannot be objectively measured nor quantitatively defined and it is basically a phenomenon that is subjective and internal to the observer. Given all those arguments we end the work arguing why the idea of conscious machines is nowadays a myth of transhumanism and science fiction culture.
- [331] arXiv:2405.07343 [pdf, ps, other]
-
Title: Graph neural networks for power grid operational risk assessment under evolving grid topologyComments: Manuscript submitted to Applied EnergySubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Methodology (stat.ME)
This article investigates the ability of graph neural networks (GNNs) to identify risky conditions in a power grid over the subsequent few hours, without explicit, high-resolution information regarding future generator on/off status (grid topology) or power dispatch decisions. The GNNs are trained using supervised learning, to predict the power grid's aggregated bus-level (either zonal or system-level) or individual branch-level state under different power supply and demand conditions. The variability of the stochastic grid variables (wind/solar generation and load demand), and their statistical correlations, are rigorously considered while generating the inputs for the training data. The outputs in the training data, obtained by solving numerous mixed-integer linear programming (MILP) optimal power flow problems, correspond to system-level, zonal and transmission line-level quantities of interest (QoIs). The QoIs predicted by the GNNs are used to conduct hours-ahead, sampling-based reliability and risk assessment w.r.t. zonal and system-level (load shedding) as well as branch-level (overloading) failure events. The proposed methodology is demonstrated for three synthetic grids with sizes ranging from 118 to 2848 buses. Our results demonstrate that GNNs are capable of providing fast and accurate prediction of QoIs and can be good proxies for computationally expensive MILP algorithms. The excellent accuracy of GNN-based reliability and risk assessment suggests that GNN models can substantially improve situational awareness by quickly providing rigorous reliability and risk estimates.
- [332] arXiv:2405.07344 [pdf, ps, other]
-
Title: TKAN: Temporal Kolmogorov-Arnold NetworksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.
- [333] arXiv:2405.07346 [pdf, ps, other]
-
Title: Understanding and Evaluating Human Preferences for AI Generated Images with Instruction TuningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Artificial Intelligence Generated Content (AIGC) has grown rapidly in recent years, among which AI-based image generation has gained widespread attention due to its efficient and imaginative image creation ability. However, AI-generated Images (AIGIs) may not satisfy human preferences due to their unique distortions, which highlights the necessity to understand and evaluate human preferences for AIGIs. To this end, in this paper, we first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+, which provides human visual preference scores and detailed preference explanations from three perspectives including quality, authenticity, and correspondence. Then, based on the constructed AIGCIQA2023+ database, this paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning. Specifically, the MINT-IQA model first learn and evaluate human preferences for AI-generated Images from multi-perspectives, then via the vision-language instruction tuning strategy, MINT-IQA attains powerful understanding and explanation ability for human visual preference on AIGIs, which can be used for feedback to further improve the assessment capabilities. Extensive experimental results demonstrate that the proposed MINT-IQA model achieves state-of-the-art performance in understanding and evaluating human visual preferences for AIGIs, and the proposed model also achieves competing results on traditional IQA tasks compared with state-of-the-art IQA models. The AIGCIQA2023+ database and MINT-IQA model will be released to facilitate future research.
- [334] arXiv:2405.07348 [pdf, ps, other]
-
Title: MedConceptsQA -- Open Source Medical Concepts QA BenchmarkSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
We present MedConceptsQA, a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs. The questions are categorized into three levels of difficulty: easy, medium, and hard. We conducted evaluations of the benchmark using various Large Language Models. Our findings show that pre-trained clinical Large Language Models achieved accuracy levels close to random guessing on this benchmark, despite being pre-trained on medical data. However, GPT-4 achieves an absolute average improvement of nearly 27%-37% (27% for zero-shot learning and 37% for few-shot learning) when compared to clinical Large Language Models. Our benchmark serves as a valuable resource for evaluating the understanding and reasoning of medical concepts by Large Language Models. Our benchmark is available at this https URL
- [335] arXiv:2405.07349 [pdf, ps, other]
-
Title: WeedScout: Real-Time Autonomous blackgrass Classification and Mapping using dedicated hardwareMatthew Gazzard, Helen Hicks, Isibor Kennedy Ihianle, Jordan J. Bird, Md Mahmudul Hasan, Pedro MachadoSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Blackgrass (Alopecurus myosuroides) is a competitive weed that has wide-ranging impacts on food security by reducing crop yields and increasing cultivation costs. In addition to the financial burden on agriculture, the application of herbicides as a preventive to blackgrass can negatively affect access to clean water and sanitation. The WeedScout project introduces a Real-Rime Autonomous Black-Grass Classification and Mapping (RT-ABGCM), a cutting-edge solution tailored for real-time detection of blackgrass, for precision weed management practices. Leveraging Artificial Intelligence (AI) algorithms, the system processes live image feeds, infers blackgrass density, and covers two stages of maturation. The research investigates the deployment of You Only Look Once (YOLO) models, specifically the streamlined YOLOv8 and YOLO-NAS, accelerated at the edge with the NVIDIA Jetson Nano (NJN). By optimising inference speed and model performance, the project advances the integration of AI into agricultural practices, offering potential solutions to challenges such as herbicide resistance and environmental impact. Additionally, two datasets and model weights are made available to the research community, facilitating further advancements in weed detection and precision farming technologies.
- [336] arXiv:2405.07351 [pdf, ps, html, other]
-
Title: A Standard Rigid Transformation Notation Convention for Robotics ResearchComments: 43 pages, 8 figures, 5 tables, 1 code listingSubjects: Robotics (cs.RO)
Notation conventions for rigid transformations are as diverse as they are fundamental to the field of robotics. A well-defined convention that is practical, consistent and unambiguous is essential for the clear communication of ideas and to foster collaboration between researchers. This work presents an analysis of conventions used in state-of-the-art robotics research, defines a new notation convention, and provides software packages to facilitate its use. To shed some light on the current state of notation conventions in robotics research, this work presents an analysis of the ICRA 2023 proceedings, focusing on the notation conventions used for rigid transformations. A total of 1655 papers were inspected to identify the convention used, and key insights about trends and usage preferences are derived. Based on this analysis, a new notation convention called RIGID is defined, which complies with the "ISO 80000 Standard on Quantities and Units". The RIGID convention is designed to be concise yet unambiguous and easy to use. Additionally, this work introduces a LaTeX package that facilitates the use of the RIGID notation in manuscripts preparation through simple customizable commands that can be easily translated into variable names for software development.
- [337] arXiv:2405.07353 [pdf, ps, other]
-
Title: Distributed Lov\'{a}sz Local Lemma under Bandwidth LimitationsSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
The constructive Lovász Local Lemma has become a central tool for designing efficient distributed algorithms. While it has been extensively studied in the classic LOCAL model that uses unlimited bandwidth, much less is known in the bandwidth-restricted CONGEST model.
In this paper, we present bandwidth- and time-efficient algorithms for various subclasses of LLL problems, including a large class of subgraph sampling problems that are naturally formulated as LLLs. Lastly, we use our LLLs to design efficient CONGEST algorithms for coloring sparse and triangle-free graphs with few colors. These coloring algorithms are exponentially faster than previous LOCAL model algorithms. - [338] arXiv:2405.07354 [pdf, ps, other]
-
Title: SoccerNet-Echoes: A Soccer Game Audio Commentary DatasetSushant Gautam, Mehdi Houshmand Sarkhoosh, Jan Held, Cise Midoglu, Anthony Cioppa, Silvio Giancola, Vajira Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak ShahSubjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data alongside visual and auditory content, SoccerNet-Echoes aims to serve as a comprehensive resource for the development of algorithms specialized in capturing the dynamics of soccer games. We detail the methods involved in the curation of this dataset and the integration of ASR. We also highlight the implications of a multimodal approach in sports analytics, and how the enriched dataset can support diverse applications, thus broadening the scope of research and development in the field of sports analytics.
- [339] arXiv:2405.07358 [pdf, ps, other]
-
Title: A Value Driven Framework for Cybersecurity Innovation in Transportation & InfrastructureSubjects: Cryptography and Security (cs.CR)
This paper introduces a value-driven cybersecurity innovation framework for the transportation and infrastructure sectors, as opposed to the traditional market-centric approaches that have dominated the field. Recontextualizing innovation categories into sustaining, incremental, disruptive, and transformative, we aim to foster a culture of self-innovation within organizations, enabling a strategic focus on cybersecurity measures that directly contribute to business value and strategic goals. This approach enhances operational effectiveness and efficiency of cyber defences primarily, while also aligns cybersecurity initiatives with mission-critical objectives. We detail a practical method for evaluating the business value of cybersecurity innovations and present a pragmatic approach for organizations to funnel innovative ideas in a structured and repeatable manner. The framework is designed to reinforce cybersecurity capabilities against an evolving cyber threat landscape while maintaining infrastructural integrity. Shifting the focus from general market appeal to sector-specific needs, our framework provides cybersecurity leaders with the strategic cyber-foresight necessary for prioritizing impactful initiatives, thereby making cybersecurity a core business enabler rather than a burden.
- [340] arXiv:2405.07359 [pdf, ps, other]
-
Title: Forecasting with an N-dimensional Langevin Equation and a Neural-Ordinary Differential EquationComments: 26 pages, 7 figuresJournal-ref: Chaos, 34, 043105 (2024)Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)
Accurate prediction of electricity day-ahead prices is essential in competitive electricity markets. Although stationary electricity-price forecasting techniques have received considerable attention, research on non-stationary methods is comparatively scarce, despite the common prevalence of non-stationary features in electricity markets. Specifically, existing non-stationary techniques will often aim to address individual non-stationary features in isolation, leaving aside the exploration of concurrent multiple non-stationary effects. Our overarching objective here is the formulation of a framework to systematically model and forecast non-stationary electricity-price time series, encompassing the broader scope of non-stationary behavior. For this purpose we develop a data-driven model that combines an N-dimensional Langevin equation (LE) with a neural-ordinary differential equation (NODE). The LE captures fine-grained details of the electricity-price behavior in stationary regimes but is inadequate for non-stationary conditions. To overcome this inherent limitation, we adopt a NODE approach to learn, and at the same time predict, the difference between the actual electricity-price time series and the simulated price trajectories generated by the LE. By learning this difference, the NODE reconstructs the non-stationary components of the time series that the LE is not able to capture. We exemplify the effectiveness of our framework using the Spanish electricity day-ahead market as a prototypical case study. Our findings reveal that the NODE nicely complements the LE, providing a comprehensive strategy to tackle both stationary and non-stationary electricity-price behavior. The framework's dependability and robustness is demonstrated through different non-stationary scenarios by comparing it against a range of basic naive methods.
- [341] arXiv:2405.07363 [pdf, ps, other]
-
Title: Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple BaselinesSubjects: Computation and Language (cs.CL)
We introduce a dataset on political orientation and power position identification. The dataset is derived from ParlaMint, a set of comparable corpora of transcribed parliamentary speeches from 29 national and regional parliaments. We introduce the dataset, provide the reasoning behind some of the choices during its creation, present statistics on the dataset, and, using a simple classifier, some baseline results on predicting political orientation on the left-to-right axis, and on power position identification, i.e., distinguishing between the speeches delivered by governing coalition party members from those of opposition party members.
- [342] arXiv:2405.07364 [pdf, ps, other]
-
Title: BoQ: A Place is Worth a Bag of Learnable QueriesComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries designed to capture universal place-specific attributes. Unlike existing methods that employ self-attention and generate the queries directly from the input features, BoQ employs distinct learnable global queries, which probe the input features via cross-attention, ensuring consistent information aggregation. In addition, our technique provides an interpretable attention mechanism and integrates with both CNN and Vision Transformer backbones. The performance of BoQ is demonstrated through extensive experiments on 14 large-scale benchmarks. It consistently outperforms current state-of-the-art techniques including NetVLAD, MixVPR and EigenPlaces. Moreover, as a global retrieval technique (one-stage), BoQ surpasses two-stage retrieval methods, such as Patch-NetVLAD, TransVPR and R2Former, all while being orders of magnitude faster and more efficient. The code and model weights are publicly available at this https URL.
- [343] arXiv:2405.07368 [pdf, ps, other]
-
Title: A New Algorithm for Computing $\alpha$-CapacitySubjects: Information Theory (cs.IT)
The problem of computing $\alpha$-capacity for $\alpha>1$ is equivalent to that of computing the correct decoding exponent. Various algorithms for computing them have been proposed, such as Arimoto and Jitsumatsu--Oohama algorithm. In this study, we propose a novel alternating optimization algorithm for computing the $\alpha$-capacity for $\alpha>1$ based on a variational characterization of the Augustin--Csisz{á}r mutual information. A comparison of the convergence performance of these algorithms is demonstrated through numerical examples.
- [344] arXiv:2405.07369 [pdf, ps, other]
-
Title: Incorporating Anatomical Awareness for Enhanced Generalizability and Progression Prediction in Deep Learning-Based Radiographic Sacroiliitis DetectionFelix J. Dorfner, Janis L. Vahldiek, Leonhard Donle, Andrei Zhukov, Lina Xu, Hartmut Häntze, Marcus R. Makowski, Hugo J.W.L. Aerts, Fabian Proft, Valeria Rios Rodriguez, Judith Rademacher, Mikhail Protopopov, Hildrun Haibel, Torsten Diekhoff, Murat Torgutalp, Lisa C. Adams, Denis Poddubnyy, Keno K. BressemSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Purpose: To examine whether incorporating anatomical awareness into a deep learning model can improve generalizability and enable prediction of disease progression.
Methods: This retrospective multicenter study included conventional pelvic radiographs of 4 different patient cohorts focusing on axial spondyloarthritis (axSpA) collected at university and community hospitals. The first cohort, which consisted of 1483 radiographs, was split into training (n=1261) and validation (n=222) sets. The other cohorts comprising 436, 340, and 163 patients, respectively, were used as independent test datasets. For the second cohort, follow-up data of 311 patients was used to examine progression prediction capabilities. Two neural networks were trained, one on images cropped to the bounding box of the sacroiliac joints (anatomy-aware) and the other one on full radiographs. The performance of the models was compared using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity.
Results: On the three test datasets, the standard model achieved AUC scores of 0.853, 0.817, 0.947, with an accuracy of 0.770, 0.724, 0.850. Whereas the anatomy-aware model achieved AUC scores of 0.899, 0.846, 0.957, with an accuracy of 0.821, 0.744, 0.906, respectively. The patients who were identified as high risk by the anatomy aware model had an odds ratio of 2.16 (95% CI: 1.19, 3.86) for having progression of radiographic sacroiliitis within 2 years.
Conclusion: Anatomical awareness can improve the generalizability of a deep learning model in detecting radiographic sacroiliitis. The model is published as fully open source alongside this study. - [345] arXiv:2405.07371 [pdf, ps, html, other]
-
Title: Extreme Distance Distributions of Poisson Voronoi CellsSubjects: Numerical Analysis (math.NA); Probability (math.PR)
Poisson point processes provide a versatile framework for modeling the distributions of random points in space. When the space is partitioned into cells, each associated with a single generating point from the Poisson process, there appears a geometric structure known as Poisson Voronoi tessellation. These tessellations find applications in various fields such as biology, material science, and communications, where the statistical properties of the Voronoi cells reveal patterns and structures that hold key insights into the underlying processes generating the observed phenomena.
In this paper, we investigate a distance measure of Poisson Voronoi tessellations that is emerging in the literature, yet for which its statistical and geometrical properties remain explored only in the asymptotic case when the density of seed points approaches infinity. Our work, specifically focused on homogeneous Poisson point processes, characterizes the cumulative distribution functions governing the smallest and largest distances between the points generating the Voronoi regions and their respective vertices for an arbitrary density of points in $\mathbb{R}^2$. For that, we conduct a Monte-Carlo type simulation with $10^8$ Voronoi cells and fit the resulting empirical cumulative distribution functions to the Generalized Gamma, Gamma, Log-normal, Rayleigh, and Weibull distributions. Our analysis compares these fits in terms of root mean-squared error and maximum absolute variation, revealing the Generalized Gamma distribution as the best-fit model for characterizing these distances in homogeneous Poisson Voronoi tessellations. Furthermore, we provide estimates for the maximum likelihood and the $95$\% confidence interval of the parameters of the Generalized Gamma distribution along with the algorithm implemented to calculate the maximum and minimum distances. - [346] arXiv:2405.07373 [pdf, ps, other]
-
Title: Probabilistic and Causal Satisfiability: the Impact of MarginalizationSubjects: Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)
The framework of Pearl's Causal Hierarchy (PCH) formalizes three types of reasoning: observational, interventional, and counterfactual, that reflect the progressive sophistication of human thought regarding causation. We investigate the computational complexity aspects of reasoning in this framework focusing mainly on satisfiability problems expressed in probabilistic and causal languages across the PCH. That is, given a system of formulas in the standard probabilistic and causal languages, does there exist a model satisfying the formulas? The resulting complexity changes depending on the level of the hierarchy as well as the operators allowed in the formulas (addition, multiplication, or marginalization).
We focus on formulas involving marginalization that are widely used in probabilistic and causal inference, but whose complexity issues are still little explored. Our main contribution are the exact computational complexity results showing that linear languages (allowing addition and marginalization) yield NP^PP-, PSPACE-, and NEXP-complete satisfiability problems, depending on the level of the PCH. Moreover, we prove that the problem for the full language (allowing additionally multiplication) is complete for the class succ$\exists$R for languages on the highest, counterfactual level. Previous work has shown that the satisfiability problem is complete for succ$\exists$R on the lower levels leaving the counterfactual case open. Finally, we consider constrained models that are restricted to a small polynomial size. The constraint on the size reduces the complexity of the interventional and counterfactual languages to NEXP-complete. - [347] arXiv:2405.07374 [pdf, ps, other]
-
Title: Conformalized Survival Distributions: A Generic Post-Process to Increase CalibrationComments: Accepted to ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Discrimination and calibration represent two important properties of survival analysis, with the former assessing the model's ability to accurately rank subjects and the latter evaluating the alignment of predicted outcomes with actual events. With their distinct nature, it is hard for survival models to simultaneously optimize both of them especially as many previous results found improving calibration tends to diminish discrimination performance. This paper introduces a novel approach utilizing conformal regression that can improve a model's calibration without degrading discrimination. We provide theoretical guarantees for the above claim, and rigorously validate the efficiency of our approach across 11 real-world datasets, showcasing its practical applicability and robustness in diverse scenarios.
- [348] arXiv:2405.07376 [pdf, ps, other]
-
Title: Advocating Feedback Control for Human-Earth System ApplicationsSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper proposes a feedback control perspective for Human-Earth Systems (HESs) which essentially are complex systems that capture the interactions between humans and nature. Recent attention in HES research has been directed towards devising strategies for climate change mitigation and adaptation, aimed at achieving environmental and societal objectives. However, existing approaches heavily rely on HES models, which inherently suffer from inaccuracies due to the complexity of the system. Moreover, overly detailed models often prove impractical for optimization tasks. We propose a framework inheriting from feedback control strategies the robustness against model errors, because inaccuracies are mitigated using measurements retrieved from the field. The framework comprises two nested control loops. The outer loop computes the optimal inputs to the HES, which are then implemented by actuators controlled in the inner loop. Potential fields of applications are also identified.
- [349] arXiv:2405.07378 [pdf, ps, other]
-
Title: Hell is Paved with Good Intentions: The Intricate Relationship Between Cognitive Biases and Dark PatternsComments: This is the author-version of this work. The paper is submitted as a draft, as it is still under review, but was published at arXiv in support of a cumulative dissertationSubjects: Human-Computer Interaction (cs.HC)
Throughout the past decade, research in HCI has identified numerous instances of dark patterns in digital interfaces. These efforts have led to a well-fostered typology describing harmful strategies users struggle to navigate. However, an in-depth understanding of the underlying mechanisms that deceive, coerce, or manipulate users is missing. We explore the interplay between cognitive biases and dark patterns to address this gap. To that end, we conducted four focus groups with experts (N=15) in psychology and dark pattern scholarship, inquiring how they conceptualise the relation between cognitive biases and dark patterns. Based on our results, we constructed the "Relationship Model of Cognitive Biases and Dark Patterns" which illustrates how cognitive bias and deceptive design patterns relate and identifies opportune moments for ethical reconsideration and user protection mechanisms. Our insights contribute to the current discourse by emphasising ethical design decisions and their implications in the field of HCI.
- [350] arXiv:2405.07381 [pdf, ps, other]
-
Title: Networked Control with Hybrid Automatic Repeat Request ProtocolsSubjects: Information Theory (cs.IT); Optimization and Control (math.OC)
We study feedback control of a dynamical process over a lossy channel equipped with a hybrid automatic repeat request protocol that connects a sensor to an actuator. The dynamical process is modeled by a Gauss-Markov process, and the lossy channel by a packet-erasure channel with ideal feedback. We suppose that data is communicated in the format of packets with negligible quantization error. In such a networked control system, whenever a packet loss occurs, there exists a tradeoff between transmitting new sensory information with a lower success probability and retransmitting previously failed sensory information with a higher success probability. In essence, an inherent tradeoff between freshness and reliability. To address this tradeoff, we consider a linear-quadratic-regulator performance index, which penalizes state deviations and control efforts over a finite horizon, and jointly design optimal policies for an encoder and a decoder, which are collocated with the sensor and the actuator, respectively. Our emphasis here lies specifically on designing switching and control policies, rather than error-correcting codes. We derive the structural properties of the optimal encoding and decoding policies. We show that the former is a threshold switching policy and the latter is a certainty-equivalent control policy. In addition, we specify the iterative equations that the encoder and the decoder need to solve in order to implement the optimal policies.
- [351] arXiv:2405.07387 [pdf, ps, other]
-
Title: Semantic Loss Functions for Neuro-Symbolic Structured PredictionKareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den BroeckComments: Preprint of Ch. 22 "Semantic Loss Functions for Neuro-Symbolic Structured Prediction" in "Compendium of Neurosymbolic Artificial Intelligence", this https URL. arXiv admin note: substantial text overlap with arXiv:2201.11250, arXiv:2007.13197Subjects: Machine Learning (cs.LG)
Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training by minimizing the network's violation of such dependencies, steering the network towards predicting distributions satisfying the underlying structure. At the same time, it is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby, while also enabling efficient end-to-end training and inference. We also discuss key improvements and applications of the semantic loss. One limitations of the semantic loss is that it does not exploit the association of every data point with certain features certifying its membership in a target class. We should therefore prefer minimum-entropy distributions over valid structures, which we obtain by additionally minimizing the neuro-symbolic entropy. We empirically demonstrate the benefits of this more refined formulation. Moreover, the semantic loss is designed to be modular and can be combined with both discriminative and generative neural models. This is illustrated by integrating it into generative adversarial networks, yielding constrained adversarial networks, a novel class of deep generative models able to efficiently synthesize complex objects obeying the structure of the underlying domain.
- [352] arXiv:2405.07391 [pdf, ps, other]
-
Title: AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real TouchMax Yang, Chenghua Lu, Alex Church, Yijiong Lin, Chris Ford, Haoran Li, Efi Psomopoulou, David A.W. Barton, Nathan F. LeporaComments: Project website can be found at this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In-hand manipulation is an integral component of human dexterity. Our hands rely on tactile feedback for stable and reactive motions to ensure objects do not slip away unintentionally during manipulation. For a robot hand, this level of dexterity requires extracting and utilizing rich contact information for precise motor control. In this paper, we present AnyRotate, a system for gravity-invariant multi-axis in-hand object rotation using dense featured sim-to-real touch. We construct a continuous contact feature representation to provide tactile feedback for training a policy in simulation and introduce an approach to perform zero-shot policy transfer by training an observation model to bridge the sim-to-real gap. Our experiments highlight the benefit of detailed contact information when handling objects with varying properties. In the real world, we demonstrate successful sim-to-real transfer of the dense tactile policy, generalizing to a diverse range of objects for various rotation axes and hand directions and outperforming other forms of low-dimensional touch. Interestingly, despite not having explicit slip detection, rich multi-fingered tactile sensing can implicitly detect object movement within grasp and provide a reactive behavior that improves the robustness of the policy, highlighting the importance of information-rich tactile sensing for in-hand manipulation.
- [353] arXiv:2405.07392 [pdf, ps, other]
-
Title: NGD-SLAM: Towards Real-Time SLAM for Dynamic Environments without GPUComments: 12 pages, 5 figuresSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Accurate and robust camera tracking in dynamic environments presents a significant challenge for visual SLAM (Simultaneous Localization and Mapping). Recent progress in this field often involves the use of deep learning techniques to generate mask for dynamic objects, which usually require GPUs to operate in real-time (30 fps). Therefore, this paper proposes a novel visual SLAM system for dynamic environments that obtains real-time performance on CPU by incorporating a mask prediction mechanism, which allows the deep learning method and the camera tracking to run entirely in parallel at different frequencies such that neither waits for the result from the other. Based on this, it further introduces a dual-stage optical flow tracking approach and employs a hybrid usage of optical flow and ORB features, which significantly enhance the efficiency and robustness of the system. Compared with state-of-the-art methods, this system maintains high localization accuracy in dynamic environments while achieving a tracking frame rate of 56 fps on a single laptop CPU without any hardware acceleration, thus proving that deep learning methods are still feasible for dynamic SLAM even without GPU support. Based on the available information, this is the first SLAM system to achieve this.
- [354] arXiv:2405.07393 [pdf, ps, other]
-
Title: Intrinsic Fairness-Accuracy Tradeoffs under Equalized OddsSubjects: Machine Learning (cs.LG)
With the growing adoption of machine learning (ML) systems in areas like law enforcement, criminal justice, finance, hiring, and admissions, it is increasingly critical to guarantee the fairness of decisions assisted by ML. In this paper, we study the tradeoff between fairness and accuracy under the statistical notion of equalized odds. We present a new upper bound on the accuracy (that holds for any classifier), as a function of the fairness budget. In addition, our bounds also exhibit dependence on the underlying statistics of the data, labels and the sensitive group attributes. We validate our theoretical upper bounds through empirical analysis on three real-world datasets: COMPAS, Adult, and Law School. Specifically, we compare our upper bound to the tradeoffs that are achieved by various existing fair classifiers in the literature. Our results show that achieving high accuracy subject to a low-bias could be fundamentally limited based on the statistical disparity across the groups.
- [355] arXiv:2405.07395 [pdf, ps, other]
-
Title: CaFA: Global Weather Forecasting with Factorized Attention on SphereComments: PreprintSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the prediction stage. Renowned for its state-of-the-art performance across diverse domains, the Transformer model has also gained popularity in machine learning weather prediction. Yet applying Transformer architectures to weather forecasting, particularly on a global scale is computationally challenging due to the quadratic complexity of attention and the quadratic increase in spatial points as resolution increases. In this work, we propose a factorized-attention-based model tailored for spherical geometries to mitigate this issue. More specifically, it utilizes multi-dimensional factorized kernels that convolve over different axes where the computational complexity of the kernel is only quadratic to the axial resolution instead of overall resolution. The deterministic forecasting accuracy of the proposed model on $1.5^\circ$ and 0-7 days' lead time is on par with state-of-the-art purely data-driven machine learning weather prediction models. We also showcase the proposed model holds great potential to push forward the Pareto front of accuracy-efficiency for Transformer weather models, where it can achieve better accuracy with less computational cost compared to Transformer based models with standard attention.
- [356] arXiv:2405.07396 [pdf, ps, other]
-
Title: An Unstructured Body-of-Revolution Electromagnetic Particle-in-Cell Algorithm with Radial Perfectly Matched Layers and Dual PolarizationsComments: This manuscript has been accepted for the publication in Computer Physics Communications COMPHY-D-23-00476R4 (May 11, 2024)Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
A novel electromagnetic particle-in-cell algorithm has been developed for fully kinetic plasma simulations on unstructured (irregular) meshes in complex body-of-revolution geometries. The algorithm, implemented in the BORPIC++ code, utilizes a set of field scalings and a coordinate mapping, reducing the Maxwell field problem in a cylindrical system to a Cartesian finite element Maxwell solver in the meridian plane. The latter obviates the cylindrical coordinate singularity in the symmetry axis. The choice of an unstructured finite element discretization enhances the geometrical flexibility of the BORPIC++ solver compared to the more traditional finite difference solvers. Symmetries in Maxwell's equations are explored to decompose the problem into two dual polarization states with isomorphic representations that enable code reuse. The particle-in-cell scatter and gather steps preserve charge-conservation at the discrete level. Our previous algorithm (BORPIC+) discretized the E and B field components of TE-phi and TM-phi polarizations on the finite element (primal) mesh. A cylindrical perfectly matched layer is implemented as a boundary condition in the radial direction to simulate open space problems, with periodic boundary conditions in the axial direction. We investigate effects of charged particles moving next to the cylindrical perfectly matched layer. We model azimuthal currents arising from rotational motion of charged rings, which produce TM-phi polarized fields. Several numerical examples are provided to illustrate the first application of the algorithm.
- [357] arXiv:2405.07399 [pdf, ps, other]
-
Title: Semi-Supervised Weed Detection for Rapid Deployment and Enhanced EfficiencyComments: 16 pages, 4 figures, 6 tables. Submitted to ElsevierSubjects: Computer Vision and Pattern Recognition (cs.CV)
Weeds present a significant challenge in agriculture, causing yield loss and requiring expensive control measures. Automatic weed detection using computer vision and deep learning offers a promising solution. However, conventional deep learning methods often require large amounts of labelled training data, which can be costly and time-consuming to acquire. This paper introduces a novel method for semi-supervised weed detection, comprising two main components. Firstly, a multi-scale feature representation technique is employed to capture distinctive weed features across different scales. Secondly, we propose an adaptive pseudo-label assignment strategy, leveraging a small set of labelled images during training. This strategy dynamically assigns confidence scores to pseudo-labels generated from unlabeled data. Additionally, our approach integrates epoch-corresponding and mixed pseudo-labels to further enhance the learning process. Experimental results on the COCO dataset and five prominent weed datasets -- CottonWeedDet12, CropAndWeed, Palmer amaranth, RadishWheat, and RoboWeedMap -- illustrate that our method achieves state-of-the-art performance in weed detection, even with significantly less labelled data compared to existing techniques. This approach holds the potential to alleviate the labelling burden and enhance the feasibility and deployment speed of deep learning for weed detection in real-world agricultural scenarios.
- [358] arXiv:2405.07404 [pdf, ps, other]
-
Title: Indoor PM2.5 forecasting and the association with outdoor air pollution: a modelling study based on sensor data in AustraliaSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Exposure to poor indoor air quality poses significant health risks, necessitating thorough assessment to mitigate associated dangers. This study aims to predict hourly indoor fine particulate matter (PM2.5) concentrations and investigate their correlation with outdoor PM2.5 levels across 24 distinct buildings in Australia. Indoor air quality data were gathered from 91 monitoring sensors in eight Australian cities spanning 2019 to 2022. Employing an innovative three-stage deep ensemble machine learning framework (DEML), comprising three base models (Support Vector Machine, Random Forest, and eXtreme Gradient Boosting) and two meta-models (Random Forest and Generalized Linear Model), hourly indoor PM2.5 concentrations were predicted. The model's accuracy was evaluated using a rolling windows approach, comparing its performance against three benchmark algorithms (SVM, RF, and XGBoost). Additionally, a correlation analysis assessed the relationship between indoor and outdoor PM2.5 concentrations. Results indicate that the DEML model consistently outperformed benchmark models, achieving an R2 ranging from 0.63 to 0.99 and RMSE from 0.01 to 0.663 mg/m3 for most sensors. Notably, outdoor PM2.5 concentrations significantly impacted indoor air quality, particularly evident during events like bushfires. This study underscores the importance of accurate indoor air quality prediction, crucial for developing location-specific early warning systems and informing effective interventions. By promoting protective behaviors, these efforts contribute to enhanced public health outcomes.
- [359] arXiv:2405.07406 [pdf, ps, other]
-
Title: Machine Unlearning: A Comprehensive SurveySubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine unlearning and discuss their differences, connections and open problems. We categorize current unlearning methods into four scenarios: centralized unlearning, distributed and irregular data unlearning, unlearning verification, and privacy and security issues in unlearning. Since centralized unlearning is the primary domain, we use two parts to introduce: firstly, we classify centralized unlearning into exact unlearning and approximate unlearning; secondly, we offer a detailed introduction to the techniques of these methods. Besides the centralized unlearning, we notice some studies about distributed and irregular data unlearning and introduce federated unlearning and graph unlearning as the two representative directions. After introducing unlearning methods, we review studies about unlearning verification. Moreover, we consider the privacy and security issues essential in machine unlearning and organize the latest related literature. Finally, we discuss the challenges of various unlearning scenarios and address the potential research directions.
- [360] arXiv:2405.07407 [pdf, ps, other]
-
Title: PitcherNet: Powering the Moneyball Evolution in Baseball Video AnalyticsComments: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW'24)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
In the high-stakes world of baseball, every nuance of a pitcher's mechanics holds the key to maximizing performance and minimizing runs. Traditional analysis methods often rely on pre-recorded offline numerical data, hindering their application in the dynamic environment of live games. Broadcast video analysis, while seemingly ideal, faces significant challenges due to factors like motion blur and low resolution. To address these challenges, we introduce PitcherNet, an end-to-end automated system that analyzes pitcher kinematics directly from live broadcast video, thereby extracting valuable pitch statistics including velocity, release point, pitch position, and release extension. This system leverages three key components: (1) Player tracking and identification by decoupling actions from player kinematics; (2) Distribution and depth-aware 3D human modeling; and (3) Kinematic-driven pitch statistics. Experimental validation demonstrates that PitcherNet achieves robust analysis results with 96.82% accuracy in pitcher tracklet identification, reduced joint position error by 1.8mm and superior analytics compared to baseline methods. By enabling performance-critical kinematic analysis from broadcast video, PitcherNet paves the way for the future of baseball analytics by optimizing pitching strategies, preventing injuries, and unlocking a deeper understanding of pitcher mechanics, forever transforming the game.
- [361] arXiv:2405.07409 [pdf, ps, other]
-
Title: ZBanner: Fast Stateless Scanning Capable of Obtaining Responses over TCPComments: The paper has been submitted and the code will be published laterSubjects: Networking and Internet Architecture (cs.NI)
Fast large-scale network scanning is an important way to understand internet service configurations and security in real time, among which stateless scan is representative. Existing stateless scanners can perform single-packet scans for internet-wide network measurements but are limited to host discovery or port scanning. To obtain further information over TCP, slower stateful scanners must be used in conjunction which spend more time and memory because of connection state maintenance. Through simplifying TCP finite state machine, this paper proposes a novel stateless scanning model, which can establish TCP connections and obtain further responses in a completely stateless manner. Based on this model, we implement ZBanner, an improved modular stateless scanner that utilizes user-defined probes for identifying services and versions, fingerprinting TLS servers, etc. We present unique design of ZBanner and experimentally characterize its feasibility and performance. Experiments show that ZBanner performs better than current state-of-the-art solutions in terms of scan rate and memory usage. ZBanner achieves at least three times faster than current tools for generic ports and over 90 times faster for open ports while keeping a minimum and stable memory usage.
- [362] arXiv:2405.07411 [pdf, ps, other]
-
Title: MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging TasksSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). However, LP only considers the output after feature extraction. Yet, there exists a gap between input medical images and natural pretrained vision model. We introduce visual prompting (VP) to fill in the gap, and analyze the strategies of coupling between LP and VP. We design a joint learning loss function containing categorisation loss and discrepancy loss, which describe the variance of prompted and plain images, naming this joint training strategy MoVL (Mixture of Visual Prompting and Linear Probe). We experiment on 4 medical image classification datasets, with two mainstream architectures, ResNet and CLIP. Results shows that without changing the parameters and architecture of backbone model and with less parameters, there is potential for MoVL to achieve full finetune (FF) accuracy (on four medical datasets, average 90.91% for MoVL and 91.13% for FF). On out of distribution medical dataset, our method(90.33%) can outperform FF (85.15%) with absolute 5.18 % lead.
- [363] arXiv:2405.07412 [pdf, ps, other]
-
Title: Non-intrusive optimal experimental design for large-scale nonlinear Bayesian inverse problems using a Bayesian approximation error approachSubjects: Numerical Analysis (math.NA)
We consider optimal experimental design (OED) for nonlinear inverse problems within the Bayesian framework. Optimizing the data acquisition process for large-scale nonlinear Bayesian inverse problems is a computationally challenging task since the posterior is typically intractable and commonly-encountered optimality criteria depend on the observed data. Since these challenges are not present in OED for linear Bayesian inverse problems, we propose an approach based on first linearizing the associated forward problem and then optimizing the experimental design. Replacing an accurate but costly model with some linear surrogate, while justified for certain problems, can lead to incorrect posteriors and sub-optimal designs if model discrepancy is ignored. To avoid this, we use the Bayesian approximation error (BAE) approach to formulate an A-optimal design objective for sensor selection that is aware of the model error. In line with recent developments, we prove that this uncertainty-aware objective is independent of the exact choice of linearization. This key observation facilitates the formulation of an uncertainty-aware OED objective function using a completely trivial linear map, the zero map, as a surrogate to the forward dynamics. The base methodology is also extended to marginalized OED problems, accommodating uncertainties arising from both linear approximations and unknown auxiliary parameters. Our approach only requires parameter and data sample pairs, hence it is particularly well-suited for black box forward models. We demonstrate the effectiveness of our method for finding optimal designs in an idealized subsurface flow inverse problem and for tsunami detection.
- [364] arXiv:2405.07414 [pdf, ps, other]
-
Title: Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular DomainsComments: ICML 2024, 18 pages (including supplementary materials)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self-supervised learning framework, we propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. This pretext task provides the encoder with an inductive bias to capture the irregular dependencies, mapping from continuous inputs to discretized bins, and mitigates the feature heterogeneity by setting all features to have category-type targets. Our empirical investigations ascertain several advantages of binning: capturing the irregular function, compatibility with encoder architecture and additional modifications, standardizing all features into equal sets, grouping similar values within a feature, and providing ordering information. Comprehensive evaluations across diverse tabular datasets corroborate that our method consistently improves tabular representation learning performance for a wide range of downstream tasks. The codes are available in this https URL.
- [365] arXiv:2405.07415 [pdf, ps, html, other]
-
Title: Structured Reinforcement Learning for Incentivized Stochastic Covert OptimizationSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
This paper studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such problems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and perform SG. The oracle probabilistically returns either a noisy gradient of the function} or a non-informative measurement, depending on the oracle state and incentive. The learner's query and incentive are visible to an eavesdropper who wishes to estimate the stationary point. This paper formulates the problem of the learner performing covert optimization by dynamically incentivizing the stochastic oracle and obfuscating the eavesdropper as a finite-horizon Markov decision process (MDP). Using conditions for interval-dominance on the cost and transition probability structure, we show that the optimal policy for the MDP has a monotone threshold structure. We propose searching for the optimal stationary policy with the threshold structure using a stochastic approximation algorithm and a multi-armed bandit approach. The effectiveness of our methods is numerically demonstrated on a covert federated learning hate-speech classification task.
- [366] arXiv:2405.07417 [pdf, ps, html, other]
-
Title: Identifying Hate Speech Peddlers in Online Platforms. A Bayesian Social Learning Approach for Large Language Model Driven Decision-MakersSubjects: Social and Information Networks (cs.SI); Signal Processing (eess.SP)
This paper studies the problem of autonomous agents performing Bayesian social learning for sequential detection when the observations of the state belong to a high-dimensional space and are expensive to analyze. Specifically, when the observations are textual, the Bayesian agent can use a large language model (LLM) as a map to get a low-dimensional private observation. The agent performs Bayesian learning and takes an action that minimizes the expected cost and is visible to subsequent agents. We prove that a sequence of such Bayesian agents herd in finite time to the public belief and take the same action disregarding the private observations. We propose a stopping time formulation for quickest time herding in social learning and optimally balance privacy and herding. Structural results are shown on the threshold nature of the optimal policy to the stopping time problem. We illustrate the application of our framework when autonomous Bayesian detectors aim to sequentially identify if a user is a hate speech peddler on an online platform by parsing text observations using an LLM. We numerically validate our results on real-world hate speech datasets. We show that autonomous Bayesian agents designed to flag hate speech peddlers in online platforms herd and misclassify the users when the public prior is strong. We also numerically show the effect of a threshold policy in delaying herding.
- [367] arXiv:2405.07418 [pdf, ps, other]
-
Title: Exploring the Effects of User-Agent and User-Designer Similarity in Virtual Human Design to Promote Mental Health Intentions for College StudentsPedro Guillermo Feijóo-García, Chase Wrenn, Alexandre Gomes de Siqueira, Rashi Ghosh, Jacob Stuart, Heng Yao, Benjamin LokComments: 43 pages, 12 figures, under review for publication at ACM Transactions on Applied PerceptionSubjects: Human-Computer Interaction (cs.HC)
Virtual humans (i.e., embodied conversational agents) have the potential to support college students' mental health, particularly in Science, Technology, Engineering, and Mathematics (STEM) fields where students are at a heightened risk of mental disorders such as anxiety and depression. A comprehensive understanding of students, considering their cultural characteristics, experiences, and expectations, is crucial for creating timely and effective virtual human interventions. To this end, we conducted a user study with 481 computer science students from a major university in North America, exploring how they co-designed virtual humans to support mental health conversations for students similar to them. Our findings suggest that computer science students who engage in co-design processes of virtual humans tend to create agents that closely resemble them demographically--agent-designer demographic similarity. Key factors influencing virtual human design included age, gender, ethnicity, and the matching between appearance and voice. We also observed that the demographic characteristics of virtual human designers, especially ethnicity and gender, tend to be associated with those of the virtual humans they designed. Finally, we provide insights concerning the impact of user-designer demographic similarity in virtual humans' effectiveness in promoting mental health conversations when designers' characteristics are shared explicitly or implicitly. Understanding how virtual humans' characteristics serve users' experiences in mental wellness conversations and the similarity-attraction effects between agents, users, and designers may help tailor virtual humans' design to enhance their acceptance and increase their counseling effectiveness.
- [368] arXiv:2405.07419 [pdf, ps, other]
-
Title: Indoor and Outdoor Crowd Density Level Estimation with Video Analysis through Machine Learning ModelsSubjects: Cryptography and Security (cs.CR)
Crowd density level estimation is an essential aspect of crowd safety since it helps to identify areas of probable overcrowding and required conditions. Nowadays, AI systems can help in various sectors. Here for safety purposes or many for public service crowd detection, tracking or estimating crowd level is essential. So we decided to build an AI project to fulfil the purpose. This project can detect crowds from images, videos, or webcams. From these images, videos, or webcams, this system can detect, track and identify humans. This system also can estimate the crowd level. Though this project is simple, it is very effective, user-friendly, and less costly. Also, we trained our system with a dataset. So our system also can predict the crowd. Though the AI system is not a hundred percent accurate, this project is more than 97 percent accurate. We also represent the dataset in a graphical way.
- [369] arXiv:2405.07423 [pdf, ps, other]
-
Title: RoboCAP: Robotic Classification and Precision Pouring of Diverse Liquids and Granular Media with Capacitive SensingYexin Hu, Alexandra Gillespie, Akhil Padmanabha, Kavya Puthuveetil, Wesley Lewis, Karan Khokar, Zackory EricksonSubjects: Robotics (cs.RO)
Liquids and granular media are pervasive throughout human environments, yet remain particularly challenging for robots to sense and manipulate precisely. In this work, we present a systematic approach at integrating capacitive sensing within robotic end effectors to enable robust sensing and precise manipulation of liquids and granular media. We introduce the parallel-jaw RoboCAP Gripper with embedded capacitive sensing arrays that enable a robot to directly sense the materials and dynamics of liquids inside of diverse containers, including some visually opaque. When coupled with model-based control, we demonstrate that the proposed system enables a robotic manipulator to achieve state-of-the-art precision pouring accuracy for a range of substances with varying dynamics properties. Code, designs, and build details are available on the project website.
- [370] arXiv:2405.07425 [pdf, ps, html, other]
-
Title: Sakuga-42M Dataset: Scaling Up Cartoon ResearchComments: Arxiv Pre-print. Work in ProgressSubjects: Computer Vision and Pattern Recognition (cs.CV)
Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a notable bias in hand-drawn cartoons that diverges from the distribution of natural videos. Can we harness the success of the scaling paradigm to benefit cartoon research? Unfortunately, until now, there has not been a sizable cartoon dataset available for exploration. In this research, we propose the Sakuga-42M Dataset, the first large-scale cartoon animation dataset. Sakuga-42M comprises 42 million keyframes covering various artistic styles, regions, and years, with comprehensive semantic annotations including video-text description pairs, anime tags, content taxonomies, etc. We pioneer the benefits of such a large-scale cartoon dataset on comprehension and generation tasks by finetuning contemporary foundation models like Video CLIP, Video Mamba, and SVD, achieving outstanding performance on cartoon-related tasks. Our motivation is to introduce large-scaling to cartoon research and foster generalization and robustness in future cartoon applications. Dataset, Code, and Pretrained Models will be publicly available.
- [371] arXiv:2405.07429 [pdf, ps, other]
-
Title: JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose EstimationComments: 8 pagesSubjects: Robotics (cs.RO)
Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning will be reduced. In order to accurately determine the position of the UAV in a planetary scene in the absence of the global navigation satellite system (GNSS), this paper proposes JointLoc, which estimates the real-time UAV position in the world coordinate system by adaptively fusing the absolute 2-degree-of-freedom (2-DoF) pose and the relative 6-degree-of-freedom (6-DoF) pose. Extensive comparative experiments were conducted on a proposed planetary UAV image cross-modal localization dataset, which contains three types of typical Martian topography generated via a simulation engine as well as real Martian UAV images from the Ingenuity helicopter. JointLoc achieved a root-mean-square error of 0.237m in the trajectories of up to 1,000m, compared to 0.594m and 0.557m for ORB-SLAM2 and ORB-SLAM3 respectively. The source code will be available at this https URL.
- [372] arXiv:2405.07430 [pdf, ps, other]
-
Title: Don't Chase Your Tail! Missing Key Aspects Augmentation in Textual Vulnerability Descriptions of Long-tail Software through Feature InferenceSubjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Augmenting missing key aspects in Textual Vulnerability Descriptions (TVDs) for software with a large user base (referred to as non-long-tail software) has greatly advanced vulnerability analysis and software security research. However, these methods often overlook software instances that have a limited user base (referred to as long-tail software) due to limited TVDs, variations in software features, and domain-specific jargon, which hinders vulnerability analysis and software repairs. In this paper, we introduce a novel software feature inference framework designed to augment the missing key aspects of TVDs for long-tail software. Firstly, we tackle the issue of non-standard software names found in community-maintained vulnerability databases by cross-referencing government databases with Common Vulnerabilities and Exposures (CVEs). Next, we employ Large Language Models (LLMs) to generate the missing key aspects. However, the limited availability of historical TVDs restricts the variety of examples. To overcome this limitation, we utilize the Common Weakness Enumeration (CWE) to classify all TVDs and select cluster centers as representative examples. To ensure accuracy, we present Natural Language Inference (NLI) models specifically designed for long-tail software. These models identify and eliminate incorrect responses. Additionally, we use a wiki repository to provide explanations for proprietary terms. Our evaluations demonstrate that our approach significantly improves the accuracy of augmenting missing key aspects of TVDs for log-tail software from 0.27 to 0.56 (+107%). Interestingly, the accuracy of non-long-tail software also increases from 64% to 71%. As a result, our approach can be useful in various downstream tasks that require complete TVD information.
- [373] arXiv:2405.07434 [pdf, ps, other]
-
Title: Concurrent aggregate queriesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Concurrent data structures serve as fundamental building blocks for concurrent computing. Many concurrent counterparts have been designed for basic sequential mechanisms; however, one notable omission is a concurrent tree that supports aggregate queries. Aggregate queries essentially compile succinct information about a range of data items, for example, calculating the average salary of employees in their 30s. Such queries play an essential role in various applications and are commonly taught in undergraduate data structures courses. In this paper, we formalize a type of aggregate queries that can be efficiently supported by concurrent trees and present a design for implementing these queries on concurrent trees. We bring two algorithms implementing this design, where one optimizes for tree update time, while the other optimizes for aggregate query time. We analyze their correctness and complexity, demonstrating the trade-offs between query time and update time.
- [374] arXiv:2405.07435 [pdf, ps, other]
-
Title: An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-AttentionComments: This manuscript is under peer reviewSubjects: Computational Engineering, Finance, and Science (cs.CE)
Today, the acquisition of various behavioral log data has enabled deeper understanding of customer preferences and future behaviors in the marketing field. In particular, multimodal deep learning has achieved highly accurate predictions by combining multiple types of data. Many of these studies utilize with feature fusion to construct multimodal models, which combines extracted representations from each modality. However, since feature fusion treats information from each modality equally, it is difficult to perform flexible analysis such as the attention mechanism that has been used extensively in recent years. Therefore, this study proposes a context-aware multimodal deep learning model that combines Bidirectional Encoder Representations from Transformers (BERT) and cross-attention Transformer, which dynamically changes the attention of deep-contextualized word representations based on background information such as consumer demographic and lifestyle variables. We conduct a comprehensive analysis and demonstrate the effectiveness of our model by comparing it with six reference models in three categories using behavioral logs stored on an online platform. In addition, we present an efficient multimodal learning method by comparing the learning efficiency depending on the optimizers and the prediction accuracy depending on the number of tokens in the text data.
- [375] arXiv:2405.07436 [pdf, ps, other]
-
Title: Can Language Models Explain Their Own Classification Behavior?Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large language models (LLMs) perform well at a myriad of tasks, but explaining the processes behind this performance is a challenge. This paper investigates whether LLMs can give faithful high-level explanations of their own internal processes. To explore this, we introduce a dataset, ArticulateRules, of few-shot text-based classification tasks generated by simple rules. Each rule is associated with a simple natural-language explanation. We test whether models that have learned to classify inputs competently (both in- and out-of-distribution) are able to articulate freeform natural language explanations that match their classification behavior. Our dataset can be used for both in-context and finetuning evaluations. We evaluate a range of LLMs, demonstrating that articulation accuracy varies considerably between models, with a particularly sharp increase from GPT-3 to GPT-4. We then investigate whether we can improve GPT-3's articulation accuracy through a range of methods. GPT-3 completely fails to articulate 7/10 rules in our test, even after additional finetuning on correct explanations. We release our dataset, ArticulateRules, which can be used to test self-explanation for LLMs trained either in-context or by finetuning.
- [376] arXiv:2405.07437 [pdf, ps, other]
-
Title: Evaluation of Retrieval-Augmented Generation: A SurveySubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Retrieval-Augmented Generation (RAG) has emerged as a pivotal innovation in natural language processing, enhancing generative models by incorporating external information retrieval. Evaluating RAG systems, however, poses distinct challenges due to their hybrid structure and reliance on dynamic knowledge sources. We consequently enhanced an extensive survey and proposed an analysis framework for benchmarks of RAG systems, RAGR (Retrieval, Generation, Additional Requirement), designed to systematically analyze RAG benchmarks by focusing on measurable outputs and established truths. Specifically, we scrutinize and contrast multiple quantifiable metrics of the Retrieval and Generation component, such as relevance, accuracy, and faithfulness, of the internal links within the current RAG evaluation methods, covering the possible output and ground truth pairs. We also analyze the integration of additional requirements of different works, discuss the limitations of current benchmarks, and propose potential directions for further research to address these shortcomings and advance the field of RAG evaluation. In conclusion, this paper collates the challenges associated with RAG evaluation. It presents a thorough analysis and examination of existing methodologies for RAG benchmark design based on the proposed RGAR framework.
- [377] arXiv:2405.07438 [pdf, ps, other]
-
Title: Towards improved software visualisation of parameterised REE patterns: Introducing REEkit for geological analysisSubjects: Human-Computer Interaction (cs.HC)
Modern geological studies and mineral exploration techniques rely heavily on being able to digitally visualise and interpret data. Rare earth elements (REEs) are vital for renewable energy technologies. REE concentrations, when normalised to a standard material, show unique geometric curves (or patterns) in geological samples due to their similar chemical properties. The lambda technique can be used to describe these patterns and turn them into points - making it easier to visualise and interpret larger datasets. Lambdas have the potential to help industry understand intricate sample relationships and the geological and economic importance of their data.
This study explored the use of lambdas through the evaluation of various visualisation methods to determine their usefulness in mineral exploration. The 'REEkit' platform facilitated the evaluation of the different visualisation methods and gauged industry interest and acceptance of such a service. Qualitative data was gathered through contextual inquiry, utilising semi-structured interviews and an observational session with 10 participants. Conceptual thematic analysis was applied to extract key findings.
This study found that two critical factors for successful lambda data visualisation in the mineral exploration industry are familiarity and clarity: visualisations that were familiar and commonplace for users allowed for better analysis and clear communication to non-technical audiences. This included visualisations such as the 3D scatter plot and scatter plot matrix. Furthermore, visualisations that complemented each other and seamlessly integrated into the same workflow provided diverse perspectives on the data. Important aspects included understanding population grouping versus data distribution, achieved through combinations such as scatter plot and density contour plot, or 3D scatter plot and violin plot. - [378] arXiv:2405.07440 [pdf, ps, html, other]
-
Title: Maximizing Information Gain in Privacy-Aware Active Learning of Email AnomaliesMu-Huan Miles Chung, Sharon Li, Jaturong Kongmanee, Lu Wang, Yuhong Yang, Calvin Giang, Khilan Jerath, Abhay Raman, David Lie, Mark ChignellComments: arXiv admin note: substantial text overlap with arXiv:2303.00870Subjects: Human-Computer Interaction (cs.HC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Redacted emails satisfy most privacy requirements but they make it more difficult to detect anomalous emails that may be indicative of data exfiltration. In this paper we develop an enhanced method of Active Learning using an information gain maximizing heuristic, and we evaluate its effectiveness in a real world setting where only redacted versions of email could be labeled by human analysts due to privacy concerns. In the first case study we examined how Active Learning should be carried out. We found that model performance was best when a single highly skilled (in terms of the labelling task) analyst provided the labels. In the second case study we used confidence ratings to estimate the labeling uncertainty of analysts and then prioritized instances for labeling based on the expected information gain (the difference between model uncertainty and analyst uncertainty) that would be provided by labelling each instance. We found that the information maximization gain heuristic improved model performance over existing sampling methods for Active Learning. Based on the results obtained, we recommend that analysts should be screened, and possibly trained, prior to implementation of Active Learning in cybersecurity applications. We also recommend that the information gain maximizing sample method (based on expert confidence) should be used in early stages of Active Learning, providing that well-calibrated confidence can be obtained. We also note that the expertise of analysts should be assessed prior to Active Learning, as we found that analysts with lower labelling skill had poorly calibrated (over-) confidence in their labels.
- [379] arXiv:2405.07441 [pdf, ps, other]
-
Title: Reducing Spatial Discretization Error on Coarse CFD Simulations Using an OpenFOAM-Embedded Deep Learning FrameworkSubjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
We propose a method for reducing the spatial discretization error of coarse computational fluid dynamics (CFD) problems by enhancing the quality of low-resolution simulations using a deep learning model fed with high-quality data. We substitute the default differencing scheme for the convection term by a feed-forward neural network that interpolates velocities from cell centers to face values to produce velocities that approximate the fine-mesh data well. The deep learning framework incorporates the open-source CFD code OpenFOAM, resulting in an end-to-end differentiable model. We automatically differentiate the CFD physics using a discrete adjoint code version. We present a fast communication method between TensorFlow (Python) and OpenFOAM (c++) that accelerates the training process. We applied the model to the flow past a square cylinder problem, reducing the error to about 50% for simulations outside the training distribution compared to the traditional solver in the x- and y-velocity components using an 8x coarser mesh. The training is affordable in terms of time and data samples since the architecture exploits the local features of the physics while generating stable predictions for mid-term simulations.
- [380] arXiv:2405.07442 [pdf, ps, other]
-
Title: Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory DiseasesSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
This study presents a novel methodology utilizing a pre-trained speech recognition model for processing respiratory sound data. By incorporating medical record information, we introduce an innovative multi-modal deep-learning architecture, named Rene, which addresses the challenges of poor interpretability and underperformance in real-time clinical diagnostic response observed in previous respiratory disease-focused models. The proposed Rene architecture demonstrated significant improvements of 10.24%, 16.15%, 15.29%, and 18.90% respectively, compared to the baseline across four tasks related to respiratory event detection and audio record classification on the SPRSound database. In patient disease prediction tests on the ICBHI database, the architecture exhibited improvements of 23% in the mean of average score and harmonic score compared to the baseline. Furthermore, we developed a real-time respiratory sound discrimination system based on the Rene architecture, featuring a dual-thread design and compressed model parameters for simultaneous microphone recording and real-time dynamic decoding. Employing state-of-the-art Edge AI technology, this system enables rapid and accurate responses for respiratory sound auscultation, facilitating deployment on wearable clinical detection devices to capture incremental data, which can be synergistically evolved with large-scale models deployed on cloud servers for downstream tasks.
- [381] arXiv:2405.07443 [pdf, ps, other]
-
Title: Minimum-Variance Recursive State Estimation for 2-D Systems: When Asynchronous Multi-Channel Delays meet Energy Harvesting ConstraintsSubjects: Systems and Control (eess.SY)
This paper is concerned with the state estimation problem for two-dimensional systems with asynchronous multichannel delays and energy harvesting constraints. In the system, each smart sensor has a certain probability of harvesting energy from the external environment, the authorized transmission between the sensor and the remote filter is contingent upon the current energy level of the sensor, which results in intermittent transmission of observation information. Addressing the issue of incomplete observation information due to asynchronous multi-channel delays, a novel approach for observation partition reconstruction is proposed to convert the delayed activated observation sequences into equivalent delay-free activated observation sequences. Through generating spatial equivalency validation, it is found that the reconstructed delay-free activated observation sequences contain the same information as the original delayed activated observation sequences. Based on the reconstructed activated observation sequence and activated probability, a novel unbiased h+1-step recursive estimator is constructed. Then, the evolution of the probability distribution of the energy level is discussed. The estimation gains are obtained by minimizing the filtering error covariance. Subsequently, through parameter assumptions, a uniform lower bound and a recursive upper bound for the filtering error covariance are presented. And the monotonicity analysis of activated probability on estimation performance is given. Finally, the effectiveness of the proposed estimation scheme is verified through a numerical simulation example.
- [382] arXiv:2405.07444 [pdf, ps, html, other]
-
Title: Motion Keyframe Interpolation for Any Human Skeleton via Temporally Consistent Point Cloud Sampling and ReconstructionComments: 17 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
In the character animation field, modern supervised keyframe interpolation models have demonstrated exceptional performance in constructing natural human motions from sparse pose definitions. As supervised models, large motion datasets are necessary to facilitate the learning process; however, since motion is represented with fixed hierarchical skeletons, such datasets are incompatible for skeletons outside the datasets' native configurations. Consequently, the expected availability of a motion dataset for desired skeletons severely hinders the feasibility of learned interpolation in practice. To combat this limitation, we propose Point Cloud-based Motion Representation Learning (PC-MRL), an unsupervised approach to enabling cross-compatibility between skeletons for motion interpolation learning. PC-MRL consists of a skeleton obfuscation strategy using temporal point cloud sampling, and an unsupervised skeleton reconstruction method from point clouds. We devise a temporal point-wise K-nearest neighbors loss for unsupervised learning. Moreover, we propose First-frame Offset Quaternion (FOQ) and Rest Pose Augmentation (RPA) strategies to overcome necessary limitations of our unsupervised point cloud-to-skeletal motion process. Comprehensive experiments demonstrate the effectiveness of PC-MRL in motion interpolation for desired skeletons without supervision from native datasets.
- [383] arXiv:2405.07445 [pdf, ps, other]
-
Title: Cybathlon -- Legged Mobile Assistance for QuadriplegicsSubjects: Robotics (cs.RO)
Assistance robots are the future for people who need daily care due to limited mobility or being wheelchair-bound. Current solutions of attaching robotic arms to motorized wheelchairs only provide limited additional mobility at the cost of increased size. We present a mouth joystick control interface, augmented with voice commands, for an independent quadrupedal assistance robot with an arm. We validate and showcase our system in the Cybathlon Challenges February 2024 Assistance Robot Race, where we solve four everyday tasks in record time, winning first place. Our system remains generic and sets the basis for a platform that could help and provide independence in the everyday lives of people in wheelchairs.
- [384] arXiv:2405.07447 [pdf, ps, other]
-
Title: From traces to measures: Large language models as a tool for psychological measurement from textComments: 12 pages, 2 figures, 1 tableSubjects: Human-Computer Interaction (cs.HC)
Digital trace data provide potentially valuable resources for understanding human behaviour, but their value has been limited by issues of unclear measurement. The growth of large language models provides an opportunity to address this limitation in the case of text data. Specifically, recognizing cases where their responses are a form of psychological measurement (the use of observable indicators to assess an underlying construct) allows existing measures and accuracy assessment frameworks from psychology to be re-purposed to use with large language models. Based on this, we offer four methodological recommendations for using these models to quantify text features: (1) identify the target of measurement, (2) use multiple prompts, (3) assess internal consistency, and (4) treat evaluation metrics (such as human annotations) as expected correlates rather than direct ground-truth measures. Additionally, we provide a workflow for implementing this approach.
- [385] arXiv:2405.07448 [pdf, ps, other]
-
Title: Evaluating the Language-Based Security for Plugin DevelopmentSubjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
With the increasing popularity of plugin-based software systems, ensuring the security of plugins has become a critical concern. When users install plugins or browse websites with plugins from an untrusted source, how can we be sure that they do have any undesirable functions implicitly? In this research, we present a comprehensive study on language-based security mechanisms for plugin development. We aim to enhance the understanding of access control vulnerabilities in plugins and explore effective security measures by introducing a capability-based system. We also developed and evaluated test plugins to assess the security mechanisms in popular development environments such as IntelliJ IDEA and Visual Studio Code by utilising Java, JavaScript, and associated APIs and frameworks. We also explore the concept of capability-based module systems as an alternative approach to plugin security. A comparative analysis is conducted to evaluate the effectiveness of capability-based systems in addressing access control vulnerabilities identified in earlier sections. Finally, recommendations for improving plugin security practices and tools will be presented, emphasizing the importance of robust security measures in the ever-evolving landscape of software plugins.
- [386] arXiv:2405.07450 [pdf, ps, other]
-
Title: Locality-Preserving Free-Form DeformationSubjects: Graphics (cs.GR)
This paper proposes a method to estimate the locations of grid handles in free-form deformation (FFD) while preserving the local shape characteristics of the 2D/3D input model embedded into the grid, named locality-preserving FFD (lp-FFD). Users first specify some vertex locations in the input model and grid handle locations. The system then optimizes all locations of grid handles by minimizing the distortion of the input model's mesh elements. The proposed method is fast and stable, allowing the user to directly and indirectly make the deformed shape of mesh model and grid. This paper shows some examples of deformation results to demonstrate the robustness of our lp-FFD. In addition, we conducted a user study and confirm our lp-FFD's efficiency and effectiveness in shape deformation is higher than those of existing methods used in commercial software.
- [387] arXiv:2405.07451 [pdf, ps, other]
-
Title: CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question AnsweringComments: Submitted to the Journal on February 6, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
While vision-language pretrained models (VLMs) excel in various multimodal understanding tasks, their potential in fine-grained audio-visual reasoning, particularly for audio-visual question answering (AVQA), remains largely unexplored. AVQA presents specific challenges for VLMs due to the requirement of visual understanding at the region level and seamless integration with audio modality. Previous VLM-based AVQA methods merely used CLIP as a feature encoder but underutilized its knowledge, and mistreated audio and video as separate entities in a dual-stream framework as most AVQA methods. This paper proposes a new CLIP-powered target-aware single-stream (TASS) network for AVQA using the image-text matching knowledge of the pretrained model through the audio-visual matching characteristic of nature. It consists of two key components: the target-aware spatial grounding module (TSG+) and the single-stream joint temporal grounding module (JTG). Specifically, we propose a TSG+ module to transfer the image-text matching knowledge from CLIP models to our region-text matching process without corresponding ground-truth labels. Moreover, unlike previous separate dual-stream networks that still required an additional audio-visual fusion module, JTG unifies audio-visual fusion and question-aware temporal grounding in a simplified single-stream architecture. It treats audio and video as a cohesive entity and further extends the pretrained image-text knowledge to audio-text matching by preserving their temporal correlation with our proposed cross-modal synchrony (CMS) loss. Extensive experiments conducted on the MUSIC-AVQA benchmark verified the effectiveness of our proposed method over existing state-of-the-art methods.
- [388] arXiv:2405.07453 [pdf, ps, other]
-
Title: An Effectiveness Study Across Baseline and Neural Network-based Force Estimation Methods on the da Vinci Research Kit Si SystemComments: Accepted by the Hamlyn Symposium on Medical Robotics 2024Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
In this study, we further investigate the robustness and generalization ability of an neural network (NN) based force estimation method, using the da Vinci Research Kit Si (dVRK-Si). To evaluate our method's performance, we compare the force estimation accuracy with several baseline methods. We conduct comparative studies between the dVRK classic and dVRK-Si systems to benchmark the effectiveness of these approaches.
We conclude that the NN-based method provides comparable force estimation accuracy across the two systems, as the average root mean square error (RMSE) over the average range of force ratio is approximately 3.07% for the dVRK classic, and 5.27% for the dVRK-Si. On the dVRK-Si, the force estimation RMSEs for all the baseline methods are 2 to 4 times larger than the NN-based method in all directions. One possible reason is, we made assumptions in the baseline methods that static forces remain the same or dynamics is time-invariant. These assumptions may hold for the dVRK Classic, as it has pre-loaded weight and maintains horizontal self balance. Since the dVRK-Si configuration does not have this property, assumptions do not hold anymore, therefore the NN-based method significantly outperforms. - [389] arXiv:2405.07454 [pdf, ps, other]
-
Title: On Securing Analog Lagrange Coded Computing from Colluding AdversariesComments: To appear in the proceedings of IEEE ISIT 2024Subjects: Information Theory (cs.IT)
Analog Lagrange Coded Computing (ALCC) is a recently proposed coded computing paradigm wherein certain computations over analog datasets can be efficiently performed using distributed worker nodes through floating point implementation. While ALCC is known to preserve privacy of data from the workers, it is not resilient to adversarial workers that return erroneous computation results. Pointing at this security vulnerability, we focus on securing ALCC from a wide range of non-colluding and colluding adversarial workers. As a foundational step, we make use of error-correction algorithms for Discrete Fourier Transform (DFT) codes to build novel algorithms to nullify the erroneous computations returned from the adversaries. Furthermore, when such a robust ALCC is implemented in practical settings, we show that the presence of precision errors in the system can be exploited by the adversaries to propose novel colluding attacks to degrade the computation accuracy. As the main takeaway, we prove a counter-intuitive result that not all the adversaries should inject noise in their computations in order to optimally degrade the accuracy of the ALCC framework. This is the first work of its kind to address the vulnerability of ALCC against colluding adversaries.
- [390] arXiv:2405.07456 [pdf, ps, other]
-
Title: Boosting House Price Estimations with Multi-Head Gated AttentionSubjects: Machine Learning (cs.LG)
Evaluating house prices is crucial for various stakeholders, including homeowners, investors, and policymakers. However, traditional spatial interpolation methods have limitations in capturing the complex spatial relationships that affect property values. To address these challenges, we have developed a new method called Multi-Head Gated Attention for spatial interpolation. Our approach builds upon attention-based interpolation models and incorporates multiple attention heads and gating mechanisms to capture spatial dependencies and contextual information better. Importantly, our model produces embeddings that reduce the dimensionality of the data, enabling simpler models like linear regression to outperform complex ensembling models. We conducted extensive experiments to compare our model with baseline methods and the original attention-based interpolation model. The results show a significant improvement in the accuracy of house price predictions, validating the effectiveness of our approach. This research advances the field of spatial interpolation and provides a robust tool for more precise house price evaluation. Our GitHub repository.contains the data and code for all datasets, which are available for researchers and practitioners interested in replicating or building upon our work.
- [391] arXiv:2405.07458 [pdf, ps, other]
-
Title: Examining Humanness as a Metaphor to Design Voice User InterfacesComments: Accepted to appear in the proceedings of CUI 2024Subjects: Human-Computer Interaction (cs.HC)
Voice User Interfaces (VUIs) increasingly leverage 'humanness' as a foundational design metaphor, adopting roles like 'assistants,' 'teachers,' and 'secretaries' to foster natural interactions. Yet, this approach can sometimes misalign user trust and reinforce societal stereotypes, leading to socio-technical challenges that might impede long-term engagement. This paper explores an alternative approach to navigate these challenges-incorporating non-human metaphors in VUI design. We report on a study with 240 participants examining the effects of human versus non-human metaphors on user perceptions within health and finance domains. Results indicate a preference for the human metaphor (doctor) over the non-human (health encyclopedia) in health contexts for its perceived enjoyability and likeability. In finance, however, user perceptions do not significantly differ between human (financial advisor) and non-human (calculator) metaphors. Importantly, our research reveals that the explicit awareness of a metaphor's use influences adoption intentions, with a marked preference for non-human metaphors when their metaphorical nature is not disclosed. These findings highlight context-specific conversation design strategies required in integrating non-human metaphors into VUI design, suggesting tradeoffs and design considerations that could enhance user engagement and adoption.
- [392] arXiv:2405.07459 [pdf, ps, other]
-
Title: DualFocus: A Unified Framework for Integrating Positive and Negative Descriptors in Text-based Person RetrievalSubjects: Computer Vision and Pattern Recognition (cs.CV)
Text-based person retrieval (TPR) aims to retrieve images of a person from an extensive array of candidates based on a given textual description. The core challenge lies in mapping visual and textual data into a unified latent space. While existing TPR methods concentrate on recognizing explicit and positive characteristics, they often neglect the critical influence of negative descriptors, resulting in potential false positives that fulfill positive criteria but could be excluded by negative descriptors. To alleviate these issues, we introduce DualFocus, a unified framework for integrating positive and negative descriptors to enhance the interpretative accuracy of vision-language foundational models regarding textual queries. DualFocus employs Dual (Positive/Negative) Attribute Prompt Learning (DAPL), which integrates Dual Image-Attribute Contrastive (DIAC) Learning and Sensitive Image-Attributes Matching (SIAM) Learning. This way DualFocus enhances the detection of unseen attributes, thereby boosting retrieval precision. To further achieve a balance between coarse and fine-grained alignment of visual and textual embeddings, we propose the Dynamic Tokenwise Similarity (DTS) loss, which refines the representation of both matching and non-matching descriptions, thereby enhancing the matching process through a detailed and adaptable similarity assessment. By focusing on token-level comparisons, DualFocus significantly outperforms existing techniques in both precision and robustness. The experiment results highlight DualFocus's superior performance on CUHK-PEDES, ICFG-PEDES, and RSTPReid.
- [393] arXiv:2405.07460 [pdf, ps, other]
-
Title: HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
Developing accurate machine learning models for oncology requires large-scale, high-quality multimodal datasets. However, creating such datasets remains challenging due to the complexity and heterogeneity of medical data. To address this challenge, we introduce HoneyBee, a scalable modular framework for building multimodal oncology datasets that leverages foundational models to generate representative embeddings. HoneyBee integrates various data modalities, including clinical records, imaging data, and patient outcomes. It employs data preprocessing techniques and transformer-based architectures to generate embeddings that capture the essential features and relationships within the raw medical data. The generated embeddings are stored in a structured format using Hugging Face datasets and PyTorch dataloaders for accessibility. Vector databases enable efficient querying and retrieval for machine learning applications. We demonstrate the effectiveness of HoneyBee through experiments assessing the quality and representativeness of the embeddings. The framework is designed to be extensible to other medical domains and aims to accelerate oncology research by providing high-quality, machine learning-ready datasets. HoneyBee is an ongoing open-source effort, and the code, datasets, and models are available at the project repository.
- [394] arXiv:2405.07465 [pdf, ps, other]
-
Title: Deception in Differential Games: Information Limiting Strategy to Induce DilemmaSubjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)
Can deception exist in differential games? We provide a case study for a Turret-Attacker differential game, where two Attackers seek to score points by reaching a target region while a Turret tries to minimize the score by aligning itself with the Attackers before they reach the target. In contrast to the original problem solved with complete information, we assume that the Turret only has partial information about the maximum speed of the Attackers. We investigate whether there is any incentive for the Attackers to move slower than their maximum speed in order to ``deceive'' the Turret into taking suboptimal actions. We first describe the existence of a dilemma that the Turret may face. Then we derive a set of initial conditions from which the Attackers can force the Turret into a situation where it must take a guess.
- [395] arXiv:2405.07467 [pdf, ps, html, other]
-
Title: MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL GenerationSubjects: Computation and Language (cs.CL)
Recent advancements in large language models (LLMs) have enabled in-context learning (ICL)-based methods that significantly outperform fine-tuning approaches for text-to-SQL tasks. However, their performance is still considerably lower than that of human experts on benchmarks that include complex schemas and queries, such as BIRD. This study considers the sensitivity of LLMs to the prompts and introduces a novel approach that leverages multiple prompts to explore a broader search space for possible answers and effectively aggregate them. Specifically, we robustly refine the database schema through schema linking using multiple prompts. Thereafter, we generate various candidate SQL queries based on the refined schema and diverse prompts. Finally, the candidate queries are filtered based on their confidence scores, and the optimal query is obtained through a multiple-choice selection that is presented to the LLM. When evaluated on the BIRD and Spider benchmarks, the proposed method achieved execution accuracies of 65.5\% and 89.6\%, respectively, significantly outperforming previous ICL-based methods. Moreover, we established a new SOTA performance on the BIRD in terms of both the accuracy and efficiency of the generated queries.
- [396] arXiv:2405.07468 [pdf, ps, other]
-
Title: Evaluating large language models in medical applications: a surveyComments: 4 figures, 1 tableSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medical information. This paper provides a comprehensive overview of the landscape of medical LLM evaluation, synthesizing insights from existing studies and highlighting evaluation data sources, task scenarios, and evaluation methods. Additionally, it identifies key challenges and opportunities in medical LLM evaluation, emphasizing the need for continued research and innovation to ensure the responsible integration of LLMs into clinical practice.
- [397] arXiv:2405.07472 [pdf, ps, other]
-
Title: GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image PromptingComments: On-going workSubjects: Computer Vision and Pattern Recognition (cs.CV)
The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.
- [398] arXiv:2405.07473 [pdf, ps, other]
-
Title: Intrinsic Rewards for Exploration without Harm from Observational Noise: A Simulation Study Based on the Free Energy PrincipleComments: 54 pages, 11 figures, to be published in Neural ComputationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In Reinforcement Learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well-established in literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the Free Energy Principle (FEP), this paper proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity, and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.
- [399] arXiv:2405.07474 [pdf, ps, other]
-
Title: Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human InstructionsSubjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)
Robots executing tasks following human instructions in domestic or industrial environments essentially require both adaptability and reliability. Behavior Tree (BT) emerges as an appropriate control architecture for these scenarios due to its modularity and reactivity. Existing BT generation methods, however, either do not involve interpreting natural language or cannot theoretically guarantee the BTs' success. This paper proposes a two-stage framework for BT generation, which first employs large language models (LLMs) to interpret goals from high-level instructions, then constructs an efficient goal-specific BT through the Optimal Behavior Tree Expansion Algorithm (OBTEA). We represent goals as well-formed formulas in first-order logic, effectively bridging intent understanding and optimal behavior planning. Experiments in the service robot validate the proficiency of LLMs in producing grammatically correct and accurately interpreted goals, demonstrate OBTEA's superiority over the baseline BT Expansion algorithm in various metrics, and finally confirm the practical deployability of our framework. The project website is this https URL.
- [400] arXiv:2405.07475 [pdf, ps, other]
-
Title: How Non-native English Speakers Use, Assess, and Select AI-Generated Paraphrases with Information AidsSubjects: Human-Computer Interaction (cs.HC)
Non-native English speakers (NNESs) often face challenges in achieving fluency in their written English. AI paraphrasing tools have the potential to improve their writing by suggesting more fluent paraphrases to their original sentences. Yet, the effectiveness of these tools depends on the user's ability to accurately assess and select context-appropriate suggestions, which is a significant challenge for those with limited English proficiency. This paper explores how NNESs utilize a paraphrasing tool augmented with information aids designed to facilitate the assessment of paraphrased suggestions. Through a formative study with 15 NNESs, we identify their specific needs when paraphrasing with AI, leading to the design of a paraphrasing tool integrated with five types of information aids, termed "support features." A user study with 22 NNESs demonstrates their heavy reliance on the paraphrasing functionality throughout the writing process, where they leverage the support features to assess and select suggestions efficiently and comprehensively. When equipped with the support features, NNESs experience enhanced writing experience in efficiency, confidence, and trust. Our findings contribute to the HCI community by (i) identifying the distinct needs of NNESs in AI paraphrasing tools, (ii) elucidating how NNESs use paraphrasing tools with support features, and (iii) offering design implications for the development of more effective AI paraphrasing tools tailored to NNESs' requirements.
- [401] arXiv:2405.07478 [pdf, ps, other]
-
Title: Coded Event-triggered Control for Nonlinear SystemsSubjects: Systems and Control (eess.SY)
This paper studies a Coded Event-triggered Control (CEC) for a class of nonlinear systems under any initial condition. To reduce communication burden, the CEC is designed from the encoding-decoding viewpoint by which only $m$-length string is transmitted for each communication between CEC and actuator. If a more general Entry Capture Problem is encountered, such control design will be rather complicated yet challenging where the performance constraints are satisfied some time after (rather than from the beginning of) system operation, rendering normally employed prescribed performance control invalid because they may be not defined in the initial interval. By introducing auxiliary functions, we develop a Self-adjustable Prescribed Performance (SPP) mechanism which can flexibly adjust the symmetric or asymmetric performance boundaries to accommodate different initial conditions, providing an effective solution for the underlying tracking problem. In this way, the resulted CEC can not only consume less communication resources but also regulate the tracking error under any initial condition into an allowable set before a given time in a bounded and customizable manner. Simulation results verify and clarify the theoretical findings.
- [402] arXiv:2405.07479 [pdf, ps, other]
-
Title: Enhancing 3D Object Detection by Using Neural Network with Self-adaptive ThresholdingComments: This paper has been accepted by the CONF-SEML 2024Subjects: Robotics (cs.RO)
Robust 3D object detection remains a pivotal concern in the domain of autonomous field robotics. Despite notable enhancements in detection accuracy across standard datasets, real-world urban environments, characterized by their unstructured and dynamic nature, frequently precipitate an elevated incidence of false positives, thereby undermining the reliability of existing detection paradigms. In this context, our study introduces an advanced post-processing algorithm that modulates detection thresholds dynamically relative to the distance from the ego object. Traditional perception systems typically utilize a uniform threshold, which often leads to decreased efficacy in detecting distant objects. In contrast, our proposed methodology employs a Neural Network with a self-adaptive thresholding mechanism that significantly attenuates false negatives while concurrently diminishing false positives, particularly in complex urban settings. Empirical results substantiate that our algorithm not only augments the performance of 3D object detection models in diverse urban and adverse weather scenarios but also establishes a new benchmark for adaptive thresholding techniques in field robotics.
- [403] arXiv:2405.07481 [pdf, ps, other]
-
Title: Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout AnalysisComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detection and grouping using separate models, or train a model from scratch while using a unified one. All of them have not yet made full use of the already well-trained text detectors and easily obtainable detection datasets. In this paper, we present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis, allowing us to adopt a well-trained text detector right off the shelf or just fine-tune it efficiently. Designed to be compatible with various text detector architectures, TGA takes detected text regions and image features as universal inputs to assemble text instance features. To capture broader contextual information for layout analysis, we propose to predict text group masks from text instance features by one-to-many assignment. Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance, simultaneously inheriting generalized text detection ability from pre-training. In the case of full parameter fine-tuning, we can further improve layout analysis performance.
- [404] arXiv:2405.07487 [pdf, ps, other]
-
Title: An Efficient Compression Method for Sign Information of DCT Coefficients via Sign RetrievalJournal-ref: 2021 IEEE International Conference on Image ProcessingSubjects: Information Theory (cs.IT)
Compression of the sign information of discrete cosine transform coefficients is an intractable problem in image compression schemes due to the equiprobable occurrence of the sign bits. To overcome this difficulty, we propose an efficient compression method for such sign information based on phase retrieval, which is a classical signal restoration problem attempting to find the phase information of discrete Fourier transform coefficients from their magnitudes. In our compression strategy, the sign bits of all the AC components in the cosine domain are excluded from a bitstream at the encoder and are complemented at the decoder by solving a sign recovery problem, which we call \textit{sign retrieval}. The experimental results demonstrate that the proposed method outperforms previous techniques for sign compression in terms of a rate-distortion criterion. Our method implemented in Python language is available from \url{this https URL}.
- [405] arXiv:2405.07488 [pdf, ps, other]
-
Title: Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold NetworksSubjects: Machine Learning (cs.LG); Robotics (cs.RO); Symbolic Computation (cs.SC)
We present a novel approach to predicting the pressure and flow rate of flexible electrohydrodynamic pumps using the Kolmogorov-Arnold Network. Inspired by the Kolmogorov-Arnold representation theorem, KAN replaces fixed activation functions with learnable spline-based activation functions, enabling it to approximate complex nonlinear functions more effectively than traditional models like Multi-Layer Perceptron and Random Forest. We evaluated KAN on a dataset of flexible EHD pump parameters and compared its performance against RF, and MLP models. KAN achieved superior predictive accuracy, with Mean Squared Errors of 12.186 and 0.001 for pressure and flow rate predictions, respectively. The symbolic formulas extracted from KAN provided insights into the nonlinear relationships between input parameters and pump performance. These findings demonstrate that KAN offers exceptional accuracy and interpretability, making it a promising alternative for predictive modeling in electrohydrodynamic pumping.
- [406] arXiv:2405.07489 [pdf, ps, other]
-
Title: Sparse Domain Transfer via Elastic Net RegularizationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Transportation of samples across different domains is a central task in several machine learning problems. A sensible requirement for domain transfer tasks in computer vision and language domains is the sparsity of the transportation map, i.e., the transfer algorithm aims to modify the least number of input features while transporting samples across the source and target domains. In this work, we propose Elastic Net Optimal Transport (ENOT) to address the sparse distribution transfer problem. The ENOT framework utilizes the $L_1$-norm and $L_2$-norm regularization mechanisms to find a sparse and stable transportation map between the source and target domains. To compute the ENOT transport map, we consider the dual formulation of the ENOT optimization task and prove that the sparsified gradient of the optimal potential function in the ENOT's dual representation provides the ENOT transport map. Furthermore, we demonstrate the application of the ENOT framework to perform feature selection for sparse domain transfer. We present the numerical results of applying ENOT to several domain transfer problems for synthetic Gaussian mixtures and real image and text data. Our empirical results indicate the success of the ENOT framework in identifying a sparse domain transport map.
- [407] arXiv:2405.07490 [pdf, ps, other]
-
Title: Strategic Data Ordering: Enhancing Large Language Model Performance through Curriculum LearningSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The rapid advancement of Large Language Models (LLMs) has improved text understanding and generation but poses challenges in computational resources. This study proposes a curriculum learning-inspired, data-centric training strategy that begins with simpler tasks and progresses to more complex ones, using criteria such as prompt length, attention scores, and loss values to structure the training data. Experiments with Mistral-7B (Jiang et al., 2023) and Gemma-7B (Team et al., 2024) models demonstrate that curriculum learning slightly improves performance compared to traditional random data shuffling. Notably, we observed that sorting data based on our proposed attention criteria generally led to better performance. This approach offers a sustainable method to enhance LLM performance without increasing model size or dataset volume, addressing scalability challenges in LLM training.
- [408] arXiv:2405.07493 [pdf, ps, other]
-
Title: Variable-Length Secret Key Agreement via Random Stopping TimeComments: 8 pagesSubjects: Information Theory (cs.IT)
We consider a key agreement setting where two parties observe correlated random sources, and want to agree on a secret key via public discussions. In order to allow the key length to adapt to the realizations of the random sources, we allow the key to be of variable length, subject to a novel variable-length version of the uniformity constraint based on random stopping time. We propose simple, computationally efficient key agreement schemes under the new constraint. The proposed scheme can be considered as the key agreement analogue of variable-length source coding via Huffman coding, and the Knuth-Yao random number generator.
- [409] arXiv:2405.07495 [pdf, ps, other]
-
Title: MacBehaviour: An R package for behavioural experimentation on large language modelsComments: 11 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
There has been increasing interest in investigating the behaviours of large language models (LLMs) and LLM-powered chatbots by treating an LLM as a participant in a psychological experiment. We therefore developed an R package called "MacBehaviour" that aims to interact with more than 60 language models in one package (e.g., OpenAI's GPT family, the Claude family, Gemini, Llama family, and open-source models) and streamline the experimental process of LLMs behaviour experiments. The package offers a comprehensive set of functions designed for LLM experiments, covering experiment design, stimuli presentation, model behaviour manipulation, logging response and token probability. To demonstrate the utility and effectiveness of "MacBehaviour," we conducted three validation experiments on three LLMs (GPT-3.5, Llama-2 7B, and Vicuna-1.5 13B) to replicate sound-gender association in LLMs. The results consistently showed that they exhibit human-like tendencies to infer gender from novel personal names based on their phonology, as previously demonstrated (Cai et al., 2023). In summary, "MacBehaviour" is an R package for machine behaviour studies which offers a user-friendly interface and comprehensive features to simplify and standardize the experimental process.
- [410] arXiv:2405.07496 [pdf, ps, other]
-
Title: Oedipus: LLM-enchanced Reasoning CAPTCHA SolverSubjects: Cryptography and Security (cs.CR)
CAPTCHAs have become a ubiquitous tool in safeguarding applications from automated bots. Over time, the arms race between CAPTCHA development and evasion techniques has led to increasingly sophisticated and diverse designs. The latest iteration, reasoning CAPTCHAs, exploits tasks that are intuitively simple for humans but challenging for conventional AI technologies, thereby enhancing security measures.
Driven by the evolving AI capabilities, particularly the advancements in Large Language Models (LLMs), we investigate the potential of multimodal LLMs to solve modern reasoning CAPTCHAs. Our empirical analysis reveals that, despite their advanced reasoning capabilities, LLMs struggle to solve these CAPTCHAs effectively. In response, we introduce Oedipus, an innovative end-to-end framework for automated reasoning CAPTCHA solving. Central to this framework is a novel strategy that dissects the complex and human-easy-AI-hard tasks into a sequence of simpler and AI-easy steps. This is achieved through the development of a Domain Specific Language (DSL) for CAPTCHAs that guides LLMs in generating actionable sub-steps for each CAPTCHA challenge. The DSL is customized to ensure that each unit operation is a highly solvable subtask revealed in our previous empirical study. These sub-steps are then tackled sequentially using the Chain-of-Thought (CoT) methodology.
Our evaluation shows that Oedipus effectively resolves the studied CAPTCHAs, achieving an average success rate of 63.5\%. Remarkably, it also shows adaptability to the most recent CAPTCHA designs introduced in late 2023, which are not included in our initial study. This prompts a discussion on future strategies for designing reasoning CAPTCHAs that can effectively counter advanced AI solutions. - [411] arXiv:2405.07497 [pdf, ps, html, other]
-
Title: Towards Subgraph Isomorphism Counting with Graph KernelsSubjects: Machine Learning (cs.LG)
Subgraph isomorphism counting is known as #P-complete and requires exponential time to find the accurate solution. Utilizing representation learning has been shown as a promising direction to represent substructures and approximate the solution. Graph kernels that implicitly capture the correlations among substructures in diverse graphs have exhibited great discriminative power in graph classification, so we pioneeringly investigate their potential in counting subgraph isomorphisms and further explore the augmentation of kernel capability through various variants, including polynomial and Gaussian kernels. Through comprehensive analysis, we enhance the graph kernels by incorporating neighborhood information. Finally, we present the results of extensive experiments to demonstrate the effectiveness of the enhanced graph kernels and discuss promising directions for future research.
- [412] arXiv:2405.07500 [pdf, ps, html, other]
-
Title: PromptLink: Leveraging Large Language Models for Cross-Source Biomedical Concept LinkingJournal-ref: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (Short-Paper Track), 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Linking (aligning) biomedical concepts across diverse data sources enables various integrative analyses, but it is challenging due to the discrepancies in concept naming conventions. Various strategies have been developed to overcome this challenge, such as those based on string-matching rules, manually crafted thesauri, and machine learning models. However, these methods are constrained by limited prior biomedical knowledge and can hardly generalize beyond the limited amounts of rules, thesauri, or training samples. Recently, large language models (LLMs) have exhibited impressive results in diverse biomedical NLP tasks due to their unprecedentedly rich prior knowledge and strong zero-shot prediction abilities. However, LLMs suffer from issues including high costs, limited context length, and unreliable predictions. In this research, we propose PromptLink, a novel biomedical concept linking framework that leverages LLMs. It first employs a biomedical-specialized pre-trained language model to generate candidate concepts that can fit in the LLM context windows. Then it utilizes an LLM to link concepts through two-stage prompts, where the first-stage prompt aims to elicit the biomedical prior knowledge from the LLM for the concept linking task and the second-stage prompt enforces the LLM to reflect on its own predictions to further enhance their reliability. Empirical results on the concept linking task between two EHR datasets and an external biomedical KG demonstrate the effectiveness of PromptLink. Furthermore, PromptLink is a generic framework without reliance on additional prior knowledge, context, or training data, making it well-suited for concept linking across various types of data sources. The source code is available at this https URL.
- [413] arXiv:2405.07503 [pdf, ps, other]
-
Title: Consistency Policy: Accelerated Visuomotor Policies via Consistency DistillationComments: this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control. By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups. A Consistency Policy is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy's learned trajectories. We compare Consistency Policy with Diffusion Policy and other related speed-up methods across 6 simulation tasks as well as two real-world tasks where we demonstrate inference on a laptop GPU. For all these tasks, Consistency Policy speeds up inference by an order of magnitude compared to the fastest alternative method and maintains competitive success rates. We also show that the Conistency Policy training procedure is robust to the pretrained Diffusion Policy's quality, a useful result that helps practioners avoid extensive testing of the pretrained model. Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps. Code and training details will be released publicly.
- [414] arXiv:2405.07505 [pdf, ps, other]
-
Title: A cyclic proof system for Guarded Kleene Algebra with Tests (full version)Comments: Full version of paper accepted at IJCAR 2024Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL); Logic (math.LO)
Guarded Kleene Algebra with Tests (GKAT for short) is an efficient fragment of Kleene Algebra with Tests, suitable for reasoning about simple imperative while-programs. Following earlier work by Das and Pous on Kleene Algebra, we study GKAT from a proof-theoretical perspective. The deterministic nature of GKAT allows for a non-well-founded sequent system whose set of regular proofs is complete with respect to the guarded language model. This is unlike the situation with Kleene Algebra, where hypersequents are required. Moreover, the decision procedure induced by proof search runs in NLOGSPACE, whereas that of Kleene Algebra is in PSPACE.
- [415] arXiv:2405.07506 [pdf, ps, other]
-
Title: Chronoblox: Chronophotographic Sequential Graph VisualizationSubjects: Human-Computer Interaction (cs.HC)
We introduce Chronoblox, a system for visualizing dynamic graphs. Chronoblox consists of a chronophotography of a sequence of graph snapshots based on a single embedding space common to all time periods. The goal of Chronoblox is to project all snapshots onto a common visualization space so as to represent both local and global dynamics at a glance. In this short paper, we review both the embedding and spatialization strategies. We then explain the way in which Chronoblox translates micro to meso structural evolution visually. We finally evaluate our approach using a synthetic network before illustrating it on a real world retweet network.
- [416] arXiv:2405.07508 [pdf, ps, html, other]
-
Title: Revealing the value of Repository Centrality in lifespan prediction of Open Source Software ProjectsSubjects: Software Engineering (cs.SE)
Background: Open Source Software is the building block of modern software. However, the prevalence of project deprecation in the open source world weakens the integrity of the downstream systems and the broad ecosystem. Therefore it calls for efforts in monitoring and predicting project deprecations, empowering stakeholders to take proactive measures. Challenge: Existing techniques mainly focus on static features on a point in time to make predictions, resulting in limited effects. Goal: We propose a novel metric from the user-repository network, and leverage the metric to fit project deprecation predictors and prove its real-life implications. Method: We establish a comprehensive dataset containing 103,354 non-fork GitHub OSS projects spanning from 2011 to 2023. We propose repository centrality, a family of HITS weights that captures shifts in the popularity of a repository in the repository-user star network. Further with the metric, we utilize the advancements in gradient boosting and deep learning to fit survival analysis models to predict project lifespan or its survival hazard. Results: Our study reveals a correlation between the HITS centrality metrics and the repository deprecation risk. A drop in the HITS weights of a repository indicates a decline in its centrality and prevalence, leading to an increase in its deprecation risk and a decrease in its expected lifespan. Our predictive models powered by repository centrality and other repository features achieve satisfactory accuracy on the test set, with repository centrality being the most significant feature among all. Implications: This research offers a novel perspective on understanding the effect of prevalence on the deprecation of OSS repositories. Our approach to predict repository deprecation help detect health status of project and take actions in advance, fostering a more resilient OSS ecosystem.
- [417] arXiv:2405.07509 [pdf, ps, other]
-
Title: RESTAD: REconstruction and Similarity based Transformer for time series Anomaly DetectionComments: Manuscript under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Anomaly detection in time series data is crucial across various domains. The scarcity of labeled data for such tasks has increased the attention towards unsupervised learning methods. These approaches, often relying solely on reconstruction error, typically fail to detect subtle anomalies in complex datasets. To address this, we introduce RESTAD, an adaptation of the Transformer model by incorporating a layer of Radial Basis Function (RBF) neurons within its architecture. This layer fits a non-parametric density in the latent representation, such that a high RBF output indicates similarity with predominantly normal training data. RESTAD integrates the RBF similarity scores with the reconstruction errors to increase sensitivity to anomalies. Our empirical evaluations demonstrate that RESTAD outperforms various established baselines across multiple benchmark datasets.
- [418] arXiv:2405.07510 [pdf, ps, other]
-
Title: PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play AcceleratorSubjects: Machine Learning (cs.LG)
We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the obtained PeRFlow models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models. The implementations of training and inference are fully open-sourced. this https URL
- [419] arXiv:2405.07513 [pdf, ps, other]
-
Title: Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and DocumentsComments: SwissText 2024Subjects: Computation and Language (cs.CL)
Encoder models trained for the embedding of sentences or short documents have proven useful for tasks such as semantic search and topic modeling. In this paper, we present a version of the SwissBERT encoder model that we specifically fine-tuned for this purpose. SwissBERT contains language adapters for the four national languages of Switzerland -- German, French, Italian, and Romansh -- and has been pre-trained on a large number of news articles in those languages. Using contrastive learning based on a subset of these articles, we trained a fine-tuned version, which we call SentenceSwissBERT. Multilingual experiments on document retrieval and text classification in a Switzerland-specific setting show that SentenceSwissBERT surpasses the accuracy of the original SwissBERT model and of a comparable baseline. The model is openly available for research use.
- [420] arXiv:2405.07515 [pdf, ps, other]
-
Title: OpenBot-Fleet: A System for Collective Learning with Real RobotsComments: Accepted at ICRA'24Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We introduce OpenBot-Fleet, a comprehensive open-source cloud robotics system for navigation. OpenBot-Fleet uses smartphones for sensing, local compute and communication, Google Firebase for secure cloud storage and off-board compute, and a robust yet low-cost wheeled robot toact in real-world environments. The robots collect task data and upload it to the cloud where navigation policies can be learned either offline or online and can then be sent back to the robot fleet. In our experiments we distribute 72 robots to a crowd of workers who operate them in homes, and show that OpenBot-Fleet can learn robust navigation policies that generalize to unseen homes with >80% success rate. OpenBot-Fleet represents a significant step forward in cloud robotics, making it possible to deploy large continually learning robot fleets in a cost-effective and scalable manner. All materials can be found at this https URL. A video is available at this https URL
- [421] arXiv:2405.07516 [pdf, ps, other]
-
Title: Support-Query Prototype Fusion Network for Few-shot Medical Image SegmentationComments: 19 pages, 7 figures, 4 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, deep learning based on Convolutional Neural Networks (CNNs) has achieved remarkable success in many applications. However, their heavy reliance on extensive labeled data and limited generalization ability to unseen classes pose challenges to their suitability for medical image processing tasks. Few-shot learning, which utilizes a small amount of labeled data to generalize to unseen classes, has emerged as a critical research area, attracting substantial attention. Currently, most studies employ a prototype-based approach, in which prototypical networks are used to construct prototypes from the support set, guiding the processing of the query set to obtain the final results. While effective, this approach heavily relies on the support set while neglecting the query set, resulting in notable disparities within the model classes. To mitigate this drawback, we propose a novel Support-Query Prototype Fusion Network (SQPFNet). SQPFNet initially generates several support prototypes for the foreground areas of the support images, thus producing a coarse segmentation mask. Subsequently, a query prototype is constructed based on the coarse segmentation mask, additionally exploiting pattern information in the query set. Thus, SQPFNet constructs high-quality support-query fused prototypes, upon which the query image is segmented to obtain the final refined query mask. Evaluation results on two public datasets, SABS and CMR, show that SQPFNet achieves state-of-the-art performance.
- [422] arXiv:2405.07518 [pdf, ps, other]
-
Title: SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of ExpertsRaghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain, Urmish Thakker, Dawei Huang, Sumti Jairath, Kevin J. Brown, Kunle OlukotunSubjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them.
In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100. - [423] arXiv:2405.07520 [pdf, ps, other]
-
Title: Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid ApproachesComments: Submitted to journal and under review, once the paper is accepted, the copyright will be transferred to the corresponding journalSubjects: Computer Vision and Pattern Recognition (cs.CV)
High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously, papers describing dehazing architectures applicable to various Remote Sensing (RS) domains are also being published. This review goes beyond the traditional focus on benchmarked haze datasets, as we also explore the application of dehazing techniques to remote sensing and UAV datasets, providing a comprehensive overview of both deep learning and prior-based approaches in these domains. We identify key challenges, including the lack of large-scale RS datasets and the need for more robust evaluation metrics, and outline potential solutions and future research directions to address them. This review is the first, to our knowledge, to provide comprehensive discussions on both existing and very recent dehazing approaches (as of 2024) on benchmarked and RS datasets, including UAV-based imagery.
- [424] arXiv:2405.07523 [pdf, ps, other]
-
Title: Adaptation of Distinct Semantics for Uncertain Areas in Polyp SegmentationComments: 13 pages with 7 figures, British Machine Vision Conference 2023Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Colonoscopy is a common and practical method for detecting and treating polyps. Segmenting polyps from colonoscopy image is useful for diagnosis and surgery progress. Nevertheless, achieving excellent segmentation performance is still difficult because of polyp characteristics like shape, color, condition, and obvious non-distinction from the surrounding context. This work presents a new novel architecture namely Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation (ADSNet), which modifies misclassified details and recovers weak features having the ability to vanish and not be detected at the final stage. The architecture consists of a complementary trilateral decoder to produce an early global map. A continuous attention module modifies semantics of high-level features to analyze two separate semantics of the early global map. The suggested method is experienced on polyp benchmarks in learning ability and generalization ability, experimental results demonstrate the great correction and recovery ability leading to better segmentation performance compared to the other state of the art in the polyp image segmentation task. Especially, the proposed architecture could be experimented flexibly for other CNN-based encoders, Transformer-based encoders, and decoder backbones.
- [425] arXiv:2405.07524 [pdf, ps, other]
-
Title: HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image RetrievalSubjects: Computer Vision and Pattern Recognition (cs.CV)
Deep image hashing aims to map input images into simple binary hash codes via deep neural networks and thus enable effective large-scale image retrieval. Recently, hybrid networks that combine convolution and Transformer have achieved superior performance on various computer tasks and have attracted extensive attention from researchers. Nevertheless, the potential benefits of such hybrid networks in image retrieval still need to be verified. To this end, we propose a hybrid convolutional and self-attention deep hashing method known as HybridHash. Specifically, we propose a backbone network with stage-wise architecture in which the block aggregation function is introduced to achieve the effect of local self-attention and reduce the computational complexity. The interaction module has been elaborately designed to promote the communication of information between image blocks and to enhance the visual representations. We have conducted comprehensive experiments on three widely used datasets: CIFAR-10, NUS-WIDE and IMAGENET. The experimental results demonstrate that the method proposed in this paper has superior performance with respect to state-of-the-art deep hashing methods. Source code is available this https URL.
- [426] arXiv:2405.07526 [pdf, ps, html, other]
-
Title: MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click LabelsQi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik, Harsha Vardhan Simhadri, Manik Varma, Yujing Wang, Linjun Yang, Mao Yang, Ce ZhangComments: 10 pages, 6 figures, for associated dataset, see this http URLSubjects: Information Retrieval (cs.IR)
Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of downstream tasks and encourages research in various areas, such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models. MS MARCO Web Search offers a retrieval benchmark with three web retrieval challenge tasks that demand innovations in both machine learning and information retrieval system research domains. As the first dataset that meets large, real and rich data requirements, MS MARCO Web Search paves the way for future advancements in AI and system research. MS MARCO Web Search dataset is available at: this https URL.
- [427] arXiv:2405.07527 [pdf, ps, other]
-
Title: Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized ModelsYubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li ShangComments: Accepted at NeurIPS 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-attention models, we can observe varying learning patterns implicitly associated with each module's trainability. To describe such modular-level learning capabilities, we introduce a novel concept dubbed modular neural tangent kernel (mNTK), and we demonstrate that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $\lambda_{\max}$. A large $\lambda_{\max}$ indicates that the module learns features with better convergence, while those miniature ones may impact generalization negatively. Inspired by the discovery, we propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their $\lambda_{\max}$ exceeding a dynamic threshold selectively, concentrating the model on learning common features and ignoring those inconsistent ones. Unlike most existing training schemes with a complete BP cycle across all network modules, MAT can significantly save computations by its partially-updating strategy and can further improve performance. Experiments show that MAT nearly halves the computational cost of model training and outperforms the accuracy of baselines.
- [428] arXiv:2405.07528 [pdf, ps, other]
-
Title: Comparing Perceptions of Static and Adaptive Proactive Speech AgentsComments: Accepted to CUI 2024 - 6th Conference on Conversational User InterfacesSubjects: Human-Computer Interaction (cs.HC)
A growing literature on speech interruptions describes how people interrupt one another with speech, but these behaviours have not yet been implemented in the design of artificial agents which interrupt. Perceptions of a prototype proactive speech agent which adapts its speech to both urgency and to the difficulty of the ongoing task it interrupts are compared against perceptions of a static proactive agent which does not. The study hypothesises that adaptive proactive speech modelled on human speech interruptions will lead to partner models which consider the proactive agent as a stronger conversational partner than a static agent, and that interruptions initiated by an adaptive agent will be judged as better timed and more appropriately asked. These hypotheses are all rejected however, as quantitative analysis reveals that participants view the adaptive agent as a poorer dialogue partner than the static agent and as less appropriate in the style it interrupts. Qualitative analysis sheds light on the source of this surprising finding, as participants see the adaptive agent as less socially appropriate and as less consistent in its interactions than the static agent.
- [429] arXiv:2405.07530 [pdf, ps, other]
-
Title: Prompt-based Code Completion via Multi-Retrieval Augmented GenerationSubjects: Software Engineering (cs.SE)
Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these models often suffer from coherence issues and hallucinations when dealing with complex code logic or extrapolating beyond their training data. Existing Retrieval Augmented Generation (RAG) techniques partially address these issues by retrieving relevant code with a separate encoding model where the retrieved snippet serves as contextual reference for code completion. However, their retrieval scope is subject to a singular perspective defined by the encoding model, which largely overlooks the complexity and diversity inherent in code semantics. To address this limitation, we propose ProCC, a code completion framework leveraging prompt engineering and the contextual multi-armed bandits algorithm to flexibly incorporate and adapt to multiple perspectives of code. ProCC first employs a prompt-based multi-retriever system which crafts prompt templates to elicit LLM knowledge to understand code semantics with multiple retrieval perspectives. Then, it adopts the adaptive retrieval selection algorithm to incorporate code similarity into the decision-making process to determine the most suitable retrieval perspective for the LLM to complete the code. Experimental results demonstrate that ProCC outperforms state-of-the-art code completion technique by 8.6% on our collected open-source benchmark suite and 10.1% on the private-domain benchmark suite collected from a billion-user e-commerce company in terms of Exact Match. ProCC also allows augmenting fine-tuned techniques in a plug-and-play manner, yielding 5.6% improvement over our studied fine-tuned model.
- [430] arXiv:2405.07533 [pdf, ps, other]
-
Title: DID Connect: Authentication in TLS with Decentralized Identifiers and Verifiable CredentialsSandro Rodriguez Garzon, Dennis Natusch, Artur Philipp, Axel Küpper, Hans Joachim Einsiedler, Daniela SchneiderComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Authentication in TLS is predominately carried out with X.509 digital certificates issued by certificate authorities (CA). The centralized nature of current public key infrastructures, however, comes along with severe risks, such as single points of failure and susceptibility to cyber-attacks, potentially undermining the security and trustworthiness of the entire system. With Decentralized Identifiers (DID) alongside distributed ledger technology, it becomes technically feasible to prove ownership of a unique identifier without requiring an attestation of the proof's public key by a centralized and therefore vulnerable CA. This article presents DID Connect, a novel authentication scheme for TLS 1.3 that empowers entities to authenticate in a TLS-compliant way with self-issued X.509 certificates that are equipped with ledger-anchored DIDs instead of CA-issued identifiers. It facilitates the exchange of tamper-proof and 3rd-party attested claims in the form of DID-bound Verifiable Credentials after the TLS handshake to complete the authentication with a full identification of the communication partner. A prototypical implementation shows comparable TLS handshake durations of DID Connect if verification material is cached and reasonable prolongations if it is obtained from a ledger. The significant speed improvement of the resulting TLS channel over a widely used, DID-based alternative transport protocol on the application layer demonstrates the potential of DID Connect to become a viable solution for the establishment of secure and trustful end-to-end communication links with decentrally managed digital identities.
- [431] arXiv:2405.07536 [pdf, ps, other]
-
Title: Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path GeneratorSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system.
- [432] arXiv:2405.07537 [pdf, ps, html, other]
-
Title: Probabilistic Rounding Error Analysis From A Statistical PerspectiveComments: 24 pages, 7 figures. Submitted to SIAM for possible publicationSubjects: Numerical Analysis (math.NA)
The conventional probabilistic rounding error analysis in numerical linear algebra provides worst-case bounds with an associated failure probability, which can still be pessimistic. In this paper, we develop a new probabilistic rounding error analysis from a statistical perspective. By assuming both the data and the relative error are independent random variables, we derive the approximate closed-form expressions for the expectation and variance of the rounding errors in various key computational kernels. Our analytical expressions have three notable characteristics: they are statistical and do not involve a failure probability; they are sharper than other deterministic and probabilistic bounds, using mean square error as the metric; they are correct to all orders of unit roundoff and valid for any dimension. Furthermore, numerical experiments validate the accuracy of our derivations and demonstrate that our analytical expressions are generally at least two orders of magnitude tighter than alternative worst-case bounds, exemplified through the inner products. We also discuss a scenario involving inner products where the underlying assumptions are invalid, i.e., input data are dependent, rendering the analytical expressions inapplicable.
- [433] arXiv:2405.07538 [pdf, ps, other]
-
Title: Mirroring the Parking Target: An Optimal-Control-Based Parking Motion Planner with Strengthened Parking Reliability and Faster Parking CompletionSubjects: Robotics (cs.RO)
Automated Parking Assist (APA) systems are now facing great challenges of low adoption in applications, due to users' concerns about parking capability, reliability, and completion efficiency. To upgrade the conventional APA planners and enhance user's acceptance, this research proposes an optimal-control-based parking motion planner. Its highlight lies in its control logic: planning trajectories by mirroring the parking target. This method enables: i) parking capability in narrow spaces; ii) better parking reliability by expanding Operation Design Domain (ODD); iii) faster completion of parking process; iv) enhanced computational efficiency; v) universal to all types of parking. A comprehensive evaluation is conducted. Results demonstrate the proposed planner does enhance parking success rate by 40.6%, improve parking completion efficiency by 18.0%, and expand ODD by 86.1%. It shows its superiority in difficult parking cases, such as the parallel parking scenario and narrow spaces. Moreover, the average computation time of the proposed planner is 74 milliseconds. Results indicate that the proposed planner is ready for real-time commercial applications.
- [434] arXiv:2405.07541 [pdf, ps, other]
-
Title: Random walk model that universally generates inverse square L\'evy walk by eliminating search cost minimization constraintShuji Shinohara, Daiki Morita, Hayato Hirai, Ryosuke Kuribayashi, Nobuhito Manome, Toru Moriyama, Hiroshi Okamoto, Yoshihiro Nakajima, Pegio-Yukio Gunji, Ung-il ChungSubjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
The Lévy walk, a type of random walk characterized by linear step lengths that follow a power-law distribution, is observed in the migratory behaviors of various organisms, ranging from bacteria to humans. Notably, Lévy walks with power exponents close to two are frequently observed, though their underlying causes remain elusive. This study introduces a simplified, abstract random walk model designed to produce inverse square Lévy walks, also known as Cauchy walks and explores the conditions that facilitate these phenomena. In our model, agents move toward a randomly selected destination in multi-dimensional space, and their movement strategy is parameterized by the extent to which they pursue the shortest path. When the search cost is proportional to the distance traveled, this parameter effectively reflects the emphasis on minimizing search costs. Our findings reveal that strict adherence to this cost minimization constraint results in a Brownian walk pattern. However, removing this constraint transitions the movement to an inverse square Lévy walk. Therefore, by modulating the prioritization of search costs, our model can seamlessly alternate between Brownian and Cauchy walk dynamics. This model has the potential to be utilized for exploring the parameter space of an optimization problem.
- [435] arXiv:2405.07542 [pdf, ps, other]
-
Title: EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language ModelsSubjects: Computation and Language (cs.CL)
Speculative decoding emerges as a pivotal technique for enhancing the inference speed of Large Language Models (LLMs). Despite recent research aiming to improve prediction efficiency, multi-sample speculative decoding has been overlooked due to varying numbers of accepted tokens within a batch in the verification phase. Vanilla method adds padding tokens in order to ensure that the number of new tokens remains consistent across samples. However, this increases the computational and memory access overhead, thereby reducing the speedup ratio. We propose a novel method that can resolve the issue of inconsistent tokens accepted by different samples without necessitating an increase in memory or computing overhead. Furthermore, our proposed method can handle the situation where the prediction tokens of different samples are inconsistent without the need to add padding tokens. Sufficient experiments demonstrate the efficacy of our method. Our code is available at this https URL.
- [436] arXiv:2405.07543 [pdf, ps, other]
-
Title: Accelerating the Evolution of Personalized Automated Lane Change through Lesson LearningSubjects: Machine Learning (cs.LG); Robotics (cs.RO)
Personalization is crucial for the widespread adoption of advanced driver assistance system. To match up with each user's preference, the online evolution capability is a must. However, conventional evolution methods learn from naturalistic driving data, which requires a lot computing power and cannot be applied online. To address this challenge, this paper proposes a lesson learning approach: learning from driver's takeover interventions. By leveraging online takeover data, the driving zone is generated to ensure perceived safety using Gaussian discriminant analysis. Real-time corrections to trajectory planning rewards are enacted through apprenticeship learning. Guided by the objective of optimizing rewards within the constraints of the driving zone, this approach employs model predictive control for trajectory planning. This lesson learning framework is highlighted for its faster evolution capability, adeptness at experience accumulating, assurance of perceived safety, and computational efficiency. Simulation results demonstrate that the proposed system consistently achieves a successful customization without further takeover interventions. Accumulated experience yields a 24% enhancement in evolution efficiency. The average number of learning iterations is only 13.8. The average computation time is 0.08 seconds.
- [437] arXiv:2405.07544 [pdf, ps, other]
-
Title: Automatic Odometry-Less OpenDRIVE Generation From Sparse Point CloudsComments: 8 pages, 4 figures, 3 algorithms, 2 tablesJournal-ref: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
High-resolution road representations are a key factor for the success of (highly) automated driving functions. These representations, for example, high-definition (HD) maps, contain accurate information on a multitude of factors, among others: road geometry, lane information, and traffic signs. Through the growing complexity and functionality of automated driving functions, also the requirements on testing and evaluation grow continuously. This leads to an increasing interest in virtual test drives for evaluation purposes. As roads play a crucial role in traffic flow, accurate real-world representations are needed, especially when deriving realistic driving behavior data. This paper proposes a novel approach to generate realistic road representations based solely on point cloud information, independent of the LiDAR sensor, mounting position, and without the need for odometry data, multi-sensor fusion, machine learning, or highly-accurate calibration. As the primary use case is simulation, we use the OpenDRIVE format for evaluation.
- [438] arXiv:2405.07547 [pdf, ps, other]
-
Title: Channel Coding Toward 6G: Technical Overview and OutlookComments: 102 pages, 87 figures, IEEE Open Journal of the Communications Society (invited paper)Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Things (IoT) communications. As the fifth generation (5G) of mobile communications is currently in operation and 5G-advanced is on the horizon, the objective of this paper is to assess prominent channel coding schemes in the context of recent advancements and the anticipated requirements for the sixth generation (6G). In this paper, after considering the potential impact of channel coding on key performance indicators (KPIs) of wireless networks, we review the evolution of mobile communication standards and the organizations involved in the standardization, from the first generation (1G) to the current 5G, highlighting the technologies integral to achieving targeted KPIs such as reliability, data rate, latency, energy efficiency, spectral efficiency, connection density, and traffic capacity. Following this, we delve into the anticipated requirements for potential use cases in 6G. The subsequent sections of the paper focus on a comprehensive review of three primary coding schemes utilized in past generations and their recent advancements: low-density parity-check (LDPC) codes, turbo codes (including convolutional codes), polar codes (alongside Reed-Muller codes). Additionally, we examine alternative coding schemes like Fountain codes and sparse regression codes. Our evaluation includes a comparative analysis of error correction performance and the performance of hardware implementation for these coding schemes, providing insights into their potential and suitability for the upcoming 6G era.
- [439] arXiv:2405.07550 [pdf, ps, html, other]
-
Title: Wild Berry image dataset collected in Finnish forests and peatlands using dronesLuigi Riz, Sergio Povoli, Andrea Caraffa, Davide Boscaini, Mohamed Lamine Mekhalfi, Paul Chippendale, Marjut Turtiainen, Birgitta Partanen, Laura Smith Ballester, Francisco Blanes Noguera, Alessio Franchi, Elisa Castelli, Giacomo Piccinini, Luca Marchesotti, Micael Santos Couceiro, Fabio PoiesiSubjects: Computer Vision and Pattern Recognition (cs.CV)
Berry picking has long-standing traditions in Finland, yet it is challenging and can potentially be dangerous. The integration of drones equipped with advanced imaging techniques represents a transformative leap forward, optimising harvests and promising sustainable practices. We propose WildBe, the first image dataset of wild berries captured in peatlands and under the canopy of Finnish forests using drones. Unlike previous and related datasets, WildBe includes new varieties of berries, such as bilberries, cloudberries, lingonberries, and crowberries, captured under severe light variations and in cluttered environments. WildBe features 3,516 images, including a total of 18,468 annotated bounding boxes. We carry out a comprehensive analysis of WildBe using six popular object detectors, assessing their effectiveness in berry detection across different forest regions and camera types. We will release WildBe publicly.
- [440] arXiv:2405.07551 [pdf, ps, other]
-
Title: MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical ReasoningComments: The state-of-the-art open-source tool-use LLMs for mathematical reasoningSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly include new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. The open LLMs (i.e., Llama-2) are finetuned on the augmented dataset to get the resulting models, MuMath-Code ($\mu$-Math-Code). During the inference phase, our MuMath-Code generates code and interacts with the external python interpreter to get the execution results. Therefore, MuMath-Code leverages the advantages of both the external tool and data augmentation. To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code. Our MuMath-Code-7B achieves 83.8 on GSM8K and 52.4 on MATH, while MuMath-Code-70B model achieves new state-of-the-art performance among open methods -- achieving 90.7% on GSM8K and 55.1% on MATH. Extensive experiments validate the combination of tool use and data augmentation, as well as our two-stage training strategy. We release the proposed dataset along with the associated code for public use.
- [441] arXiv:2405.07553 [pdf, ps, other]
-
Title: Space Domain based Ecological Cooperative and Adaptive Cruise Control on Rolling TerrainSubjects: Robotics (cs.RO)
Ecological Cooperative and Adaptive Cruise Control (Eco-CACC) is widely focused to enhance sustainability of CACC. However, state-of-the-art Eco-CACC studies are still facing challenges in adopting on rolling terrain. Furthermore, they cannot ensure both ecology optimality and computational efficiency. Hence, this paper proposes a nonlinear optimal control based Eco-CACC controller. It has the following features: i) enhancing performance across rolling terrains by modeling in space domain; ii) enhancing fuel efficiency via globally optimizing all vehicle's fuel consumptions; iii) ensuring computational efficiency by developing a differential dynamic programming-based solving method for the non-linear optimal control problem; iv) ensuring string stability through theoretically proving and experimentally validating. The performance of the proposed Eco-CACC controller was evaluated. Results showed that the proposed Eco-CACC controller can improve average fuel saving by 37.67% at collector road and about 17.30% at major arterial.
- [442] arXiv:2405.07556 [pdf, ps, other]
-
Title: Safety-Aware Human-Lead Vehicle Platooning by Proactively Reacting to Uncertain Human BehavingSubjects: Robotics (cs.RO)
Human-Lead Cooperative Adaptive Cruise Control (HL-CACC) is regarded as a promising vehicle platooning technology in real-world implementation. By utilizing a Human-driven Vehicle (HV) as the platoon leader, HL-CACC reduces the cost and enhances the reliability of perception and decision-making. However, state-of-the-art HL-CACC technology still has a great limitation on driving safety for the lack of considering the leading human driver's uncertain behaving. In this study, a HL-CACC controller is designed based on Stochastic Model Predictive Control (SMPC). It is enabled to predict the driving intention of the leading Connected Human-Driven Vehicle (CHV). The proposed controller has the following features: i) enhanced perceived safety in oscillating traffic; ii) guaranteed safety against hard brakes; iii) computational efficient for real-time implementation. The proposed controller is evaluated on a PreScan&Simulink simulation platform. Real vehicle trajectory data is collected for the calibration of simulation. Results reveal that the proposed controller: i) improves perceived safety by 19.17% in oscillating traffic; ii) enhances actual safety by 7.76% against hard brake; iii) is confirmed with string stability. The computation time is approximately 3 milliseconds when running on a laptop equipped with an Intel i5-13500H CPU. This indicates the proposed controller is ready for real-time implementation.
- [443] arXiv:2405.07557 [pdf, ps, other]
-
Title: Towards Rational Consensus in Honest MajoritySubjects: Computer Science and Game Theory (cs.GT); Distributed, Parallel, and Cluster Computing (cs.DC)
Distributed consensus protocols reach agreement among $n$ players in the presence of $f$ adversaries; different protocols support different values of $f$. Existing works study this problem for different adversary types (captured by threat models). There are three primary threat models: (i) Crash fault tolerance (CFT), (ii) Byzantine fault tolerance (BFT), and (iii) Rational fault tolerance (RFT), each more general than the previous. Agreement in repeated rounds on both (1) the proposed value in each round and (2) the ordering among agreed-upon values across multiple rounds is called Atomic BroadCast (ABC). ABC is more generalized than consensus and is employed in blockchains.
This work studies ABC under the RFT threat model. We consider $t$ byzantine and $k$ rational adversaries among $n$ players. We also study different types of rational players based on their utility towards (1) liveness attack, (2) censorship or (3) disagreement (forking attack). We study the problem of ABC under this general threat model in partially-synchronous networks. We show (1) ABC is impossible for $n/3< (t+k) <n/2$ if rational players prefer liveness or censorship attacks and (2) the consensus protocol proposed by Ranchal-Pedrosa and Gramoli cannot be generalized to solve ABC due to insecure Nash equilibrium (resulting in disagreement). For ABC in partially synchronous network settings, we propose a novel protocol \textsf{pRFT}(practical Rational Fault Tolerance). We show \textsf{pRFT} achieves ABC if (a) rational players prefer only disagreement attacks and (b) $t < \frac{n}{4}$ and $(t + k) < \frac{n}{2}$. In \textsf{pRFT}, we incorporate accountability (capturing deviating players) within the protocol by leveraging honest players. We also show that the message complexity of \textsf{pRFT} is at par with the best consensus protocols that guarantee accountability. - [444] arXiv:2405.07560 [pdf, ps, other]
-
Title: Coding historical causes of death data with Large Language ModelsBjørn Pedersen, Maisha Islam, Doris Tove Kristoffersen, Lars Ailo Bongo, Eilidh Garrett, Alice Reid, Hilde SommersethComments: 18 pages, 1 figure in main text, 3 figures in appendixSubjects: Machine Learning (cs.LG)
This paper investigates the feasibility of using pre-trained generative Large Language Models (LLMs) to automate the assignment of ICD-10 codes to historical causes of death. Due to the complex narratives often found in historical causes of death, this task has traditionally been manually performed by coding experts. We evaluate the ability of GPT-3.5, GPT-4, and Llama 2 LLMs to accurately assign ICD-10 codes on the HiCaD dataset that contains causes of death recorded in the civil death register entries of 19,361 individuals from Ipswich, Kilmarnock, and the Isle of Skye from the UK between 1861-1901. Our findings show that GPT-3.5, GPT-4, and Llama 2 assign the correct code for 69%, 83%, and 40% of causes, respectively. However, we achieve a maximum accuracy of 89% by standard machine learning techniques. All LLMs performed better for causes of death that contained terms still in use today, compared to archaic terms. Also they perform better for short causes (1-2 words) compared to longer causes. LLMs therefore do not currently perform well enough for historical ICD-10 code assignment tasks. We suggest further fine-tuning or alternative frameworks to achieve adequate performance.
- [445] arXiv:2405.07562 [pdf, ps, html, other]
-
Title: GLiRA: Black-Box Membership Inference Attack via Knowledge DistillationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
While Deep Neural Networks (DNNs) have demonstrated remarkable performance in tasks related to perception and control, there are still several unresolved concerns regarding the privacy of their training data, particularly in the context of vulnerability to Membership Inference Attacks (MIAs). In this paper, we explore a connection between the susceptibility to membership inference attacks and the vulnerability to distillation-based functionality stealing attacks. In particular, we propose {GLiRA}, a distillation-guided approach to membership inference attack on the black-box neural network. We observe that the knowledge distillation significantly improves the efficiency of likelihood ratio of membership inference attack, especially in the black-box setting, i.e., when the architecture of the target model is unknown to the attacker. We evaluate the proposed method across multiple image classification datasets and models and demonstrate that likelihood ratio attacks when guided by the knowledge distillation, outperform the current state-of-the-art membership inference attacks in the black-box setting.
- [446] arXiv:2405.07570 [pdf, ps, other]
-
Title: Gaze-Based Intention Recognition for Human-Robot CollaborationValerio Belcamino, Miwa Takase, Mariya Kilina, Alessandro Carfì, Akira Shimada, Sota Shimizu, Fulvio MastrogiovanniComments: 5 pages, 4 figures, AVI2024 conferenceSubjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
This work aims to tackle the intent recognition problem in Human-Robot Collaborative assembly scenarios. Precisely, we consider an interactive assembly of a wooden stool where the robot fetches the pieces in the correct order and the human builds the parts following the instruction manual. The intent recognition is limited to the idle state estimation and it is needed to ensure a better synchronization between the two agents. We carried out a comparison between two distinct solutions involving wearable sensors and eye tracking integrated into the perception pipeline of a flexible planning architecture based on Hierarchical Task Networks. At runtime, the wearable sensing module exploits the raw measurements from four 9-axis Inertial Measurement Units positioned on the wrists and hands of the user as an input for a Long Short-Term Memory Network. On the other hand, the eye tracking relies on a Head Mounted Display and Unreal Engine.
We tested the effectiveness of the two approaches with 10 participants, each of whom explored both options in alternate order. We collected explicit metrics about the attractiveness and efficiency of the two techniques through User Experience Questionnaires as well as implicit criteria regarding the classification time and the overall assembly time.
The results of our work show that the two methods can reach comparable performances both in terms of effectiveness and user preference. Future development could aim at joining the two approaches two allow the recognition of more complex activities and to anticipate the user actions. - [447] arXiv:2405.07571 [pdf, ps, other]
-
Title: TattTRN: Template Reconstruction Network for Tattoo RetrievalComments: Accepted at CVPR Workshop 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Tattoos have been used effectively as soft biometrics to assist law enforcement in the identification of offenders and victims, as they contain discriminative information, and are a useful indicator to locate members of a criminal gang or organisation. Due to various privacy issues in the acquisition of images containing tattoos, only a limited number of databases exists. This lack of databases has delayed the development of new methods to effectively retrieve a potential suspect's tattoo images from a candidate gallery. To mitigate this issue, in our work, we use an unsupervised generative approach to create a balanced database consisting of 28,550 semi-synthetic images with tattooed subjects from 571 tattoo categories. Further, we introduce a novel Tattoo Template Reconstruction Network (TattTRN), which learns to map the input tattoo sample to its respective tattoo template to enhance the distinguishing attributes of the final feature embedding. Experimental results with real data, i.e., WebTattoo and BIVTatt databases, demonstrate the soundness of the presented approach: an accuracy of up to 99% is achieved for checking at most the first 20 entries of the candidate list.
- [448] arXiv:2405.07573 [pdf, ps, html, other]
-
Title: MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Current multi-modality driving frameworks normally fuse representation by utilizing attention between single-modality branches. However, the existing networks still suppress the driving performance as the Image and LiDAR branches are independent and lack a unified observation representation. Thus, this paper proposes MaskFuser, which tokenizes various modalities into a unified semantic feature space and provides a joint representation for further behavior cloning in driving contexts. Given the unified token representation, MaskFuser is the first work to introduce cross-modality masked auto-encoder training. The masked training enhances the fusion representation by reconstruction on masked tokens. Architecturally, a hybrid-fusion network is proposed to combine advantages from both early and late fusion: For the early fusion stage, modalities are fused by performing monotonic-to-BEV translation attention between branches; Late fusion is performed by tokenizing various modalities into a unified token space with shared encoding on it. MaskFuser respectively reaches a driving score of 49.05 and route completion of 92.85% on the CARLA LongSet6 benchmark evaluation, which improves the best of previous baselines by 1.74 and 3.21%. The introduced masked fusion increases driving stability under damaged sensory inputs. MaskFuser outperforms the best of previous baselines on driving score by 6.55 (27.8%), 1.53 (13.8%), 1.57 (30.9%), respectively given sensory masking ratios 25%, 50%, and 75%.
- [449] arXiv:2405.07580 [pdf, ps, html, other]
-
Title: DynLLM: When Large Language Models Meet Dynamic Graph RecommendationComments: 11 pages, 5 figuresSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Last year has witnessed the considerable interest of Large Language Models (LLMs) for their potential applications in recommender systems, which may mitigate the persistent issue of data sparsity. Though large efforts have been made for user-item graph augmentation with better graph-based recommendation performance, they may fail to deal with the dynamic graph recommendation task, which involves both structural and temporal graph dynamics with inherent complexity in processing time-evolving data. To bridge this gap, in this paper, we propose a novel framework, called DynLLM, to deal with the dynamic graph recommendation task with LLMs. Specifically, DynLLM harnesses the power of LLMs to generate multi-faceted user profiles based on the rich textual features of historical purchase records, including crowd segments, personal interests, preferred categories, and favored brands, which in turn supplement and enrich the underlying relationships between users and items. Along this line, to fuse the multi-faceted profiles with temporal graph embedding, we engage LLMs to derive corresponding profile embeddings, and further employ a distilled attention mechanism to refine the LLM-generated profile embeddings for alleviating noisy signals, while also assessing and adjusting the relevance of each distilled facet embedding for seamless integration with temporal graph embedding from continuous time dynamic graphs (CTDGs). Extensive experiments on two real e-commerce datasets have validated the superior improvements of DynLLM over a wide range of state-of-the-art baseline methods.
- [450] arXiv:2405.07582 [pdf, ps, other]
-
Title: FRRffusion: Unveiling Authenticity with Diffusion-Based Face Retouching ReversalSubjects: Computer Vision and Pattern Recognition (cs.CV)
Unveiling the real appearance of retouched faces to prevent malicious users from deceptive advertising and economic fraud has been an increasing concern in the era of digital economics. This article makes the first attempt to investigate the face retouching reversal (FRR) problem. We first collect an FRR dataset, named deepFRR, which contains 50,000 StyleGAN-generated high-resolution (1024*1024) facial images and their corresponding retouched ones by a commercial online API. To our best knowledge, deepFRR is the first FRR dataset tailored for training the deep FRR models. Then, we propose a novel diffusion-based FRR approach (FRRffusion) for the FRR task. Our FRRffusion consists of a coarse-to-fine two-stage network: A diffusion-based Facial Morpho-Architectonic Restorer (FMAR) is constructed to generate the basic contours of low-resolution faces in the first stage, while a Transformer-based Hyperrealistic Facial Detail Generator (HFDG) is designed to create high-resolution facial details in the second stage. Tested on deepFRR, our FRRffusion surpasses the GP-UNIT and Stable Diffusion methods by a large margin in four widespread quantitative metrics. Especially, the de-retouched images by our FRRffusion are visually much closer to the raw face images than both the retouched face images and those restored by the GP-UNIT and Stable Diffusion methods in terms of qualitative evaluation with 85 subjects. These results sufficiently validate the efficacy of our work, bridging the recently-standing gap between the FRR and generic image restoration tasks. The dataset and code are available at this https URL.
- [451] arXiv:2405.07584 [pdf, ps, other]
-
Title: La ROUTOURNE va tournerQuentin Bramas (ICube, UNISTRA), Jean-Romain Luttringer (ICube, UNISTRA), Pascal Mérindol (ICube, UNISTRA)Comments: in French language. AlgoTel 2024 -- 26{è}mes Rencontres Francophones sur les Aspects Algorithmiques des T{é}l{é}communications, May 2024, Saint-Briac-sur-Mer, FranceSubjects: Networking and Internet Architecture (cs.NI)
Segment routing (SR) offers precise control over the paths taken: it specifies a list of detours, called segments, in IP packets. However, the number of detours that can be specified is limited by the hardware. When calculating segment lists, it is therefore necessary to limit their size. Although solutions have been proposed for calculating these lists, they lack generality and are not always optimal or efficient. We present ROUTOURNE, a method for diverting routing algorithms so that they calculate, not simply an optimal physical path to be translated into a list of segments a posteriori (with no guarantee of its size), but directly the optimal lists of segments deployable by the underlying hardware. ROUTOURNE thus facilitates the deployment of advanced traffic engineering strategies and policies, notably for load balancing from sources. Despite a route fraught with surprising challenges - in particular, the loss of isotonicity induced by SR - ROUTOURNE proves efficient, inducing at worst a linear overhead. Its accuracy and optimality have been proven, and its effectiveness evaluated by generalizing it to several more or less complex path calculation algorithms.
- [452] arXiv:2405.07585 [pdf, ps, other]
-
Title: On the Coexistence of eMBB and URLLC in the Cell-Free Massive MIMO DownlinkComments: Paper submitted for presentation to an IEEE conference. © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other usesSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
We investigate the non-orthogonal coexistence between the ultra-reliable low-latency communication (URLLC) and the enhanced mobile broadband (eMBB) in the downlink of a cell-free massive multiple-input multiple-output (MIMO) system. We provide a unified information-theoretic framework that combines a finite-blocklength analysis of the URLLC error probability based on the use of mismatched decoding with an infinite-blocklength analysis of the eMBB spectral efficiency. Superposition coding and three levels of puncturing are considered as alternative downlink coexistence strategies to cope with the inter-service interference and the URLLC random activation pattern, under the assumption of imperfect pilot-based channel state information acquisition at the access points and statistical channel knowledge at the users. Numerical results shed light into the trade-off between eMBB and URLLC performances considering different precoding and power control strategies.
- [453] arXiv:2405.07586 [pdf, ps, html, other]
-
Title: Thai Universal Dependency TreebankPanyur Sriwirote, Wei Qi Leong, Charin Polpanumas, Santhawat Thanyawong, William Chandra Tjhi, Wirote Aroonmanakun, Attapol T. RutherfordSubjects: Computation and Language (cs.CL)
Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published systematic evaluation of state-of-the-art models, especially transformer-based parsers. In this work, we address these problems by introducing Thai Universal Dependency Treebank (TUD), a new largest Thai treebank consisting of 3,627 trees annotated in accordance with the Universal Dependencies (UD) framework. We then benchmark dependency parsing models that incorporate pretrained transformers as encoders and train them on Thai-PUD and our TUD. The evaluation results show that most of our models can outperform other models reported in previous papers and provide insight into the optimal choices of components to include in Thai dependency parsers. The new treebank and every model's full prediction generated in our experiment are made available on a GitHub repository for further study.
- [454] arXiv:2405.07587 [pdf, ps, other]
-
Title: Structure-Preserving Model Order Reduction for Nonlinear DAE Models of Power NetworksSubjects: Systems and Control (eess.SY)
This paper deals with the joint reduction of dynamic states (internal states of generator, solar, and loads, etc) and algebraic variables (states of the network e.g., voltage and phase angles) of a nonlinear differential-algebraic equation (NDAE) model of power networks. Traditionally, in the current literature of power systemmodel order reduction (MOR), the algebraic constraints are usually neglected and the power network is commonly modeled via a set of ordinary differential equations (ODEs) instead of NDAEs. Thus, reduction is usually carried out for the dynamic states only and the algebraic variables are kept intact. This leaves a significant part of the system's size and complexity unreduced. This paper addresses this aforementioned limitation, by jointly reducing both dynamic and algebraic variables. As compared to the literature the proposedMOR techniques herein are endowed with the following features: (i) no system linearization is required, (ii) requires no transformation to an equivalent or approximate ODE representation, (iii) guarantee that the reduced order model to be NDAE and thus preserves the differential-algebraic structure of original power system model, and (iv) can seamlessly reduce both dynamic and algebraic variables while maintaining high accuracy. Case studies performed on a 2000-bus power system reveal that the proposedMOR techniques are able to reduce system order while maintaining accuracy
- [455] arXiv:2405.07588 [pdf, ps, other]
-
Title: Practical Computation of Graph VC-DimensionDavid Coudert (COATI), Mónika Csikós (IRIF (UMR\_8243)), Guillaume Ducoffe (UniBuc, ICI), Laurent Viennot (DI-ENS, ARGO)Journal-ref: Symposium on Experimental Algorithms (SEA) 2024, Jul 2024, Vienne, AustriaSubjects: Data Structures and Algorithms (cs.DS)
For any set system $H=(V,R), \ R \subseteq 2^V$, a subset $S \subseteq V$ is called \emph{shattered} if every $S' \subseteq S$ results from the intersection of $S$ with some set in $\R$. The \emph{VC-dimension} of $H$ is the size of a largest shattered set in $V$. In this paper, we focus on the problem of computing the VC-dimension of graphs. In particular, given a graph $G=(V,E)$, the VC-dimension of $G$ is defined as the VC-dimension of $(V, \mathcal N)$, where $\mathcal N$ contains each subset of $V$ that can be obtained as the closed neighborhood of some vertex $v \in V$ in $G$. Our main contribution is an algorithm for computing the VC-dimension of any graph, whose effectiveness is shown through experiments on various types of practical graphs, including graphs with millions of vertices. A key aspect of its efficiency resides in the fact that practical graphs have small VC-dimension, up to 8 in our experiments. As a side-product, we present several new bounds relating the graph VC-dimension to other classical graph theoretical notions. We also establish the $W[1]$-hardness of the graph VC-dimension problem by extending a previous result for arbitrary set systems.
- [456] arXiv:2405.07590 [pdf, ps, html, other]
-
Title: Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series DataCamelia Oprea, Mike Grüne, Mateusz Buglowski, Lena Olivier, Thorsten Orlikowsky, Stefan Kowalewski, Mark Schoberer, André StollenwerkComments: \c{opyright} 2024 The authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND. Accepted for the 12th IFAC Symposium on Biological and Medical Systems. 6 pages, 7 figuresSubjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
With the digitalization of health care systems, artificial intelligence becomes more present in medicine. Especially machine learning shows great potential for complex tasks such as time series classification, usually at the cost of transparency and comprehensibility. This leads to a lack of trust by humans and thus hinders its active usage. Explainable artificial intelligence tries to close this gap by providing insight into the decision-making process, the actual usefulness of its different methods is however unclear. This paper proposes a user study based evaluation of the explanation method Grad-CAM with application to a neural network for the classification of breaths in time series neonatal ventilation data. We present the perceived usefulness of the explainability method by different stakeholders, exposing the difficulty to achieve actual transparency and the wish for more in-depth explanations by many of the participants.
- [457] arXiv:2405.07591 [pdf, ps, html, other]
-
Title: A Partially Defined Game with PaymentsSubjects: Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
We investigate a new problem that can be solved by using the theory of a partially defined game. We consider the situation described below: first, we assume that the worth of the grand and singleton coalitions is only known. It take some amount of costs to obtain worth of larger coalitions. If it is performed, then players make a payment from the worth of the grand coalition. That is, the worth of the grand coalition is reduced by examinations of coalitional worth. The problem of a partially defined game with payments is finding the solution of partially defined games at each point and the best exiting rule of examinations of coalitional worth.
- [458] arXiv:2405.07594 [pdf, ps, other]
-
Title: RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud RegistrationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Point cloud registration is a fundamental task for estimating rigid transformations between point clouds. Previous studies have used geometric information for extracting features, matching and estimating transformation. Recently, owing to the advancement of RGB-D sensors, researchers have attempted to utilize visual information to improve registration performance. However, these studies focused on extracting distinctive features by deep feature fusion, which cannot effectively solve the negative effects of each feature's weakness, and cannot sufficiently leverage the valid information. In this paper, we propose a new feature combination framework, which applies a looser but more effective fusion and can achieve better performance. An explicit filter based on transformation consistency is designed for the combination framework, which can overcome each feature's weakness. And an adaptive threshold determined by the error distribution is proposed to extract more valid information from the two types of features. Owing to the distinctive design, our proposed framework can estimate more accurate correspondences and is applicable to both hand-crafted and learning-based feature descriptors. Experiments on ScanNet show that our method achieves a state-of-the-art performance and the rotation accuracy of 99.1%.
- [459] arXiv:2405.07595 [pdf, ps, other]
-
Title: Environmental Matching Attack Against Unmanned Aerial Vehicles Object DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches that appear natural to the human eye while ensuring a high attack success rate. We notice that patches are natural looking when their overall color is consistent with the environment. Therefore, we propose a new method named Environmental Matching Attack(EMA) to address the issue of optimizing the adversarial patch under the constraints of color. To the best of our knowledge, this paper is the first to consider natural patches in the domain of UAVs. The EMA method exploits strong prior knowledge of a pretrained stable diffusion to guide the optimization direction of the adversarial patch, where the text guidance can restrict the color of the patch. To better match the environment, the contrast and brightness of the patch are appropriately adjusted. Instead of optimizing the adversarial patch itself, we optimize an adversarial perturbation patch which initializes to zero so that the model can better trade off attacking performance and naturalness. Experiments conducted on the DroneVehicle and Carpk datasets have shown that our work can reach nearly the same attack performance in the digital attack(no greater than 2 in mAP$\%$), surpass the baseline method in the physical specific scenarios, and exhibit a significant advantage in terms of naturalness in visualization and color difference with the environment.
- [460] arXiv:2405.07596 [pdf, ps, other]
-
Title: Local Mutual-Information Differential PrivacyComments: submitted to the IEEE Information Theory Workshop (ITW) 2024Subjects: Information Theory (cs.IT)
Local mutual-information differential privacy (LMIDP) is a privacy notion that aims to quantify the reduction of uncertainty about the input data when the output of a privacy-preserving mechanism is revealed. We study the relation of LMIDP with local differential privacy (LDP), the de facto standard notion of privacy in context-independent (CI) scenarios, and with local information privacy (LIP), the state-of-the-art notion for context-dependent settings. We establish explicit conversion rules, i.e., bounds on the privacy parameters for a LMIDP mechanism to also satisfy LDP/LIP, and vice versa. We use our bounds to formally verify that LMIDP is a weak privacy notion. We also show that uncorrelated Gaussian noise is the best-case noise in terms of CI-LMIDP if both the input data and the noise are subject to an average power constraint.
- [461] arXiv:2405.07597 [pdf, ps, other]
-
Title: Using Model-Theoretic Approaches to Uncover Linguistic OrganizationSubjects: Computation and Language (cs.CL)
In this paper, we consider pluractional markers in Kaqchikel, Karuk, and Yurok. Like Balinese, each of these languages marks one type of pluractionality via reduplication, and a different type of pluractionality via non-reduplicative affixation. This paper serves as a proof-of-concept for applying model-theoretic approaches to language as a lens that can help us to recognize linguistic organization that is not apparent on the surface.
- [462] arXiv:2405.07600 [pdf, ps, other]
-
Title: Integrity Monitoring of 3D Object Detection in Automated Driving Systems using Raw Activation Patterns and Spatial FilteringComments: Submitted to ITSC 2024. arXiv admin note: text overlap with arXiv:2404.07685Subjects: Computer Vision and Pattern Recognition (cs.CV)
The deep neural network (DNN) models are widely used for object detection in automated driving systems (ADS). Yet, such models are prone to errors which can have serious safety implications. Introspection and self-assessment models that aim to detect such errors are therefore of paramount importance for the safe deployment of ADS. Current research on this topic has focused on techniques to monitor the integrity of the perception mechanism in ADS. Existing introspection models in the literature, however, largely concentrate on detecting perception errors by assigning equal importance to all parts of the input data frame to the perception module. This generic approach overlooks the varying safety significance of different objects within a scene, which obscures the recognition of safety-critical errors, posing challenges in assessing the reliability of perception in specific, crucial instances. Motivated by this shortcoming of state of the art, this paper proposes a novel method integrating raw activation patterns of the underlying DNNs, employed by the perception module, analysis with spatial filtering techniques. This novel approach enhances the accuracy of runtime introspection of the DNN-based 3D object detections by selectively focusing on an area of interest in the data, thereby contributing to the safety and efficacy of ADS perception self-assessment processes.
- [463] arXiv:2405.07601 [pdf, ps, other]
-
Title: On-device Online Learning and Semantic Management of TinyML SystemsComments: Accepted by Journal Transactions on Embedded Computing Systems (TECS)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Recent advances in Tiny Machine Learning (TinyML) empower low-footprint embedded devices for real-time on-device Machine Learning. While many acknowledge the potential benefits of TinyML, its practical implementation presents unique challenges. This study aims to bridge the gap between prototyping single TinyML models and developing reliable TinyML systems in production: (1) Embedded devices operate in dynamically changing conditions. Existing TinyML solutions primarily focus on inference, with models trained offline on powerful machines and deployed as static objects. However, static models may underperform in the real world due to evolving input data distributions. We propose online learning to enable training on constrained devices, adapting local models towards the latest field conditions. (2) Nevertheless, current on-device learning methods struggle with heterogeneous deployment conditions and the scarcity of labeled data when applied across numerous devices. We introduce federated meta-learning incorporating online learning to enhance model generalization, facilitating rapid learning. This approach ensures optimal performance among distributed devices by knowledge sharing. (3) Moreover, TinyML's pivotal advantage is widespread adoption. Embedded devices and TinyML models prioritize extreme efficiency, leading to diverse characteristics ranging from memory and sensors to model architectures. Given their diversity and non-standardized representations, managing these resources becomes challenging as TinyML systems scale up. We present semantic management for the joint management of models and devices at scale. We demonstrate our methods through a basic regression example and then assess them in three real-world TinyML applications: handwritten character image classification, keyword audio classification, and smart building presence detection, confirming our approaches' effectiveness.
- [464] arXiv:2405.07603 [pdf, ps, other]
-
Title: Reducing Risk for Assistive Reinforcement Learning Policies with Diffusion ModelsSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Care-giving and assistive robotics, driven by advancements in AI, offer promising solutions to meet the growing demand for care, particularly in the context of increasing numbers of individuals requiring assistance. This creates a pressing need for efficient and safe assistive devices, particularly in light of heightened demand due to war-related injuries. While cost has been a barrier to accessibility, technological progress is able to democratize these solutions. Safety remains a paramount concern, especially given the intricate interactions between assistive robots and humans. This study explores the application of reinforcement learning (RL) and imitation learning, in improving policy design for assistive robots. The proposed approach makes the risky policies safer without additional environmental interactions. Through experimentation using simulated environments, the enhancement of the conventional RL approaches in tasks related to assistive robotics is demonstrated.
- [465] arXiv:2405.07604 [pdf, ps, html, other]
-
Title: Improving classifier-based effort-aware software defect prediction by reducing ranking errorsComments: 10 pages with 12 figures. Accepted by International Conference on Evaluation and Assessment in Software Engineering (EASE) 2024Subjects: Software Engineering (cs.SE)
Context: Software defect prediction utilizes historical data to direct software quality assurance resources to potentially problematic components. Effort-aware (EA) defect prediction prioritizes more bug-like components by taking cost-effectiveness into account. In other words, it is a ranking problem, however, existing ranking strategies based on classification, give limited consideration to ranking errors. Objective: Improve the performance of classifier-based EA ranking methods by focusing on ranking errors. Method: We propose a ranking score calculation strategy called EA-Z which sets a lower bound to avoid near-zero ranking errors. We investigate four primary EA ranking strategies with 16 classification learners, and conduct the experiments for EA-Z and the other four existing strategies. Results: Experimental results from 72 data sets show EA-Z is the best ranking score calculation strategy in terms of Recall@20% and Popt when considering all 16 learners. For particular learners, imbalanced ensemble learner UBag-svm and UBst-rf achieve top performance with EA-Z. Conclusion: Our study indicates the effectiveness of reducing ranking errors for classifier-based effort-aware defect prediction. We recommend using EA-Z with imbalanced ensemble learning.
- [466] arXiv:2405.07605 [pdf, ps, other]
-
Title: Empirical Application Insights on Industrial Data and Service Aspects of Digital Twin NetworksMarco Becattini, Davide Borsatti, Armir Bujari, Laura Carnevali, Andrea Garbugli, Hrant Khachatrian, Theofanis P. Raptis, Daniele TarchiComments: Funding: (i) European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, partnership on "Telecommunications of the Future" PE00000001 - program "RESTART", (ii) RA Science Committee grant No. 22rl-052Journal-ref: IEEE MeditCom 2024Subjects: Networking and Internet Architecture (cs.NI); Emerging Technologies (cs.ET)
Digital twin networks (DTNs) serve as an emerging facilitator in the industrial networking sector, enabling the management of new classes of services, which require tailored support for improved resource utilization, low latencies and accurate data fidelity. In this paper, we explore the intersection between theoretical recommendations and practical implications of applying DTNs to industrial networked environments, sharing empirical findings and lessons learned from our ongoing work. To this end, we first provide experimental examples from selected aspects of data representations and fidelity, mixed-criticality workload support, and application-driven services. Then, we introduce an architectural framework for DTNs, exposing a more practical extension of existing standards; notably the ITU-T Y.3090 (2022) recommendation. Specifically, we explore and discuss the dual nature of DTNs, meant as a digital twin of the network and a network of digital twins, allowing the co-existence of both paradigms.
- [467] arXiv:2405.07606 [pdf, ps, other]
-
Title: AIris: An AI-powered Wearable Assistive Device for the Visually ImpairedSubjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
Assistive technologies for the visually impaired have evolved to facilitate interaction with a complex and dynamic world. In this paper, we introduce AIris, an AI-powered wearable device that provides environmental awareness and interaction capabilities to visually impaired users. AIris combines a sophisticated camera mounted on eyewear with a natural language processing interface, enabling users to receive real-time auditory descriptions of their surroundings. We have created a functional prototype system that operates effectively in real-world conditions. AIris demonstrates the ability to accurately identify objects and interpret scenes, providing users with a sense of spatial awareness previously unattainable with traditional assistive devices. The system is designed to be cost-effective and user-friendly, supporting general and specialized tasks: face recognition, scene description, text reading, object recognition, money counting, note-taking, and barcode scanning. AIris marks a transformative step, bringing AI enhancements to assistive technology, enabling rich interactions with a human-like feel.
- [468] arXiv:2405.07608 [pdf, ps, other]
-
Title: FNCC: Fast Notification Congestion Control in Data Center NetworksJing Xu, Zhan Wang, Fan Yang, Ning Kang, Zhenlong Ma, Xiaoyi Lu, Rui Miao, Guojun Yuan, Guangming Tan, Ninghui SunSubjects: Networking and Internet Architecture (cs.NI)
Congestion control plays a pivotal role in large-scale data centers, facilitating ultra-low latency, high bandwidth, and optimal utilization. Even with the deployment of data center congestion control mechanisms such as DCQCN and HPCC, these algorithms often respond to congestion sluggishly. This sluggishness is primarily due to the slow notification of congestion. It takes almost one round-trip time (RTT) for the congestion information to reach the sender. In this paper, we introduce the Fast Notification Congestion Control (FNCC) mechanism, which achieves sub-RTT notification. FNCC leverages the acknowledgment packet (ACK) from the return path to carry in-network telemetry (INT) information of the request path, offering the sender more timely and accurate INT. To further accelerate the responsiveness of last-hop congestion control, we propose that the receiver notifies the sender of the number of concurrent congested flows, which can be used to adjust the congested flows to a fair rate quickly. Our experimental results demonstrate that FNCC reduces flow completion time by 27.4% and 88.9% compared to HPCC and DCQCN, respectively. Moreover, FNCC triggers minimal pause frames and maintains high utilization even at 400Gbps.
- [469] arXiv:2405.07609 [pdf, ps, other]
-
Title: NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity RecognitionComments: data available at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Available training data for named entity recognition (NER) often contains a significant percentage of incorrect labels for entity types and entity boundaries. Such label noise poses challenges for supervised learning and may significantly deteriorate model quality. To address this, prior work proposed various noise-robust learning approaches capable of learning from data with partially incorrect labels. These approaches are typically evaluated using simulated noise where the labels in a clean dataset are automatically corrupted. However, as we show in this paper, this leads to unrealistic noise that is far easier to handle than real noise caused by human error or semi-automatic annotation. To enable the study of the impact of various types of real noise, we introduce NoiseBench, an NER benchmark consisting of clean training data corrupted with 6 types of real noise, including expert errors, crowdsourcing errors, automatic annotation errors and LLM errors. We present an analysis that shows that real noise is significantly more challenging than simulated noise, and show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound. We release NoiseBench to the research community.
- [470] arXiv:2405.07611 [pdf, ps, other]
-
Title: Uncovering GNSS Interference with Aerial Mapping UAVComments: In proceedings of the 2024 IEEE Aerospace Conference (AeroConf)Subjects: Cryptography and Security (cs.CR)
Global Navigation Satellite System (GNSS) receivers provide ubiquitous and precise position, navigation, and time (PNT) to a wide gamut of civilian and tactical infrastructures and devices. Due to the low GNSS received signal power, even low-power radiofrequency interference (RFI) sources are a serious threat to the GNSS integrity and availability. Nonetheless, RFI source localization is paramount yet hard, especially over large areas. Methods based on multi-rotor unmanned aerial vehicles (UAV) exist but are often limited by hovering time, and require specific antenna and detectors. In comparison, fixed-wing planes allow longer missions but are more complex to operate and deploy. A vertical take-off and landing (VTOL) UAV combines the positive aspects of both platforms: high maneuverability, and long mission time and, jointly with highly integrated control systems, simple operation and deployment. Building upon the flexibility allowed by such a platform, we propose a method that combines advanced flight dynamics with high-performance consumer receivers to detect interference over large areas, with minimal interaction with the operator. The proposed system can detect multiple interference sources and map their area of influence, gaining situational awareness of poor GNSS quality or denied environments. Furthermore, it can estimate the relative heading and position of the interference source within tens of meters. The proposed method is validated with real-life measurements, successfully mapping two interference-affected areas and exposing radio equipment causing involuntary in-band interference.
- [471] arXiv:2405.07615 [pdf, ps, other]
-
Title: ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge SourceSubjects: Computation and Language (cs.CL)
Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWikiFC, the first manual annotated open-domain corpus for Vietnamese Wikipedia Fact Checking more than 20K claims generated by converting evidence sentences extracted from Wikipedia articles. We analyze our corpus through many linguistic aspects, from the new dependency rate, the new n-gram rate, and the new word rate. We conducted various experiments for Vietnamese fact-checking, including evidence retrieval and verdict prediction. BM25 and InfoXLM (Large) achieved the best results in two tasks, with BM25 achieving an accuracy of 88.30% for SUPPORTS, 86.93% for REFUTES, and only 56.67% for the NEI label in the evidence retrieval task, InfoXLM (Large) achieved an F1 score of 86.51%. Furthermore, we also conducted a pipeline approach, which only achieved a strict accuracy of 67.00% when using InfoXLM (Large) and BM25. These results demonstrate that our dataset is challenging for the Vietnamese language model in fact-checking tasks.
- [472] arXiv:2405.07616 [pdf, ps, other]
-
Title: Conditional well-posedness and data-driven method for identifying the dynamic source in a coupled diffusion system from one single boundary measurementComments: arXiv admin note: text overlap with arXiv:2307.14348Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph); Analysis of PDEs (math.AP)
This work considers the inverse dynamic source problem arising from the time-domain fluorescence diffuse optical tomography (FDOT). We recover the dynamic distributions of fluorophores in biological tissue by the one single boundary measurement in finite time domain. We build the uniqueness theorem of this inverse problem. After that, we introduce a weighted norm and establish the conditional stability of Lipschitz type for the inverse problem by this weighted norm. The numerical inversions are considered under the framework of the deep neural networks (DNNs). We establish the generalization error estimates rigorously derived from Lipschitz conditional stability of inverse problem. Finally, we propose the reconstruction algorithms and give several numerical examples illustrating the performance of the proposed inversion schemes.
- [473] arXiv:2405.07620 [pdf, ps, other]
-
Title: New Low-Dissipation Central-Upwind Schemes. Part IISubjects: Numerical Analysis (math.NA)
The low-dissipation central-upwind (LDCU) schemes have been recently introduced in [A. Kurganov and R. Xin, J. Sci. Comput., 96 (2023), Paper No. 56] as a modification of the central-upwind (CU) schemes from [{\sc A. Kurganov and C. T. Lin, Commun. Comput. Phys., 2 (2007), pp. 141-163}]. The LDCU schemes achieve much higher resolution of contact waves and many (two-dimensional) structures resulting from complicated wave interaction. However, the LDCU schemes sometimes produce more oscillatory results compared with the CU schemes, especially near the computational domain boundaries.
In this paper, we propose a very simple -- yet systematic -- modification of the LDCU schemes, which completely eliminates the aforementioned oscillations almost without affecting the quality of the computed solution. - [474] arXiv:2405.07621 [pdf, ps, other]
-
Title: Towards Adaptive IMFs -- Generalization of utility functions in Multi-Agent FrameworksComments: Accepted in Netsoft-2024 conferenceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Intent Management Function (IMF) is an integral part of future-generation networks. In recent years, there has been some work on AI-based IMFs that can handle conflicting intents and prioritize the global objective based on apriori definition of the utility function and accorded priorities for competing intents. Some of the earlier works use Multi-Agent Reinforcement Learning (MARL) techniques with AdHoc Teaming (AHT) approaches for efficient conflict handling in IMF. However, the success of such frameworks in real-life scenarios requires them to be flexible to business situations. The intent priorities can change and the utility function, which measures the extent of intent fulfilment, may also vary in definition. This paper proposes a novel mechanism whereby the IMF can generalize to different forms of utility functions and change of intent priorities at run-time without additional training. Such generalization ability, without additional training requirements, would help to deploy IMF in live networks where customer intents and priorities change frequently. Results on the network emulator demonstrate the efficacy of the approach, scalability for new intents, outperforming existing techniques that require additional training to achieve the same degree of flexibility thereby saving cost, and increasing efficiency and adaptability.
- [475] arXiv:2405.07623 [pdf, ps, other]
-
Title: COBias and Debias: Minimizing Language Model Pairwise Accuracy Bias via Nonlinear Integer ProgrammingSubjects: Computation and Language (cs.CL)
For language model classification, would you prefer having only one workable class or having every class working? The latter makes more practical uses. Especially for large language models (LLMs), the fact that they achieve a fair overall accuracy by in-context learning (ICL) obscures a large difference in individual class accuracies. In this work, we uncover and tackle language models' imbalance in per-class prediction accuracy by reconceptualizing it as the Contextual Oddity Bias (COBias), and we are the first to engage nonlinear integer programming (NIP) to debias it. Briefly, COBias refers to the difference in accuracy by a class A compared to its ''odd'' class, which holds the majority wrong predictions of class A. With the COBias metric, we reveal that LLMs of varied scales and families exhibit large per-class accuracy differences. Then we propose Debiasing as Nonlinear Integer Programming (DNIP) to correct ICL per-class probabilities for lower bias and higher overall accuracy. Our optimization objective is directly based on the evaluation scores by COBias and accuracy metrics, solved by simulated annealing. Evaluations on three LLMs across seven NLP classification tasks show that DNIP simultaneously achieves significant COBias reduction ($-27\%$) and accuracy improvement ($+12\%$) over the conventional ICL approach, suggesting that modeling pairwise class accuracy differences is a direction in pushing forward more accurate, more reliable LLM predictions.
- [476] arXiv:2405.07626 [pdf, ps, other]
-
Title: AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language ModelsComments: 13pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edges or require sufficient labeled data for model training, which harms their applicability for real-world applications. In this paper, we study this problem by cooperating with the rich knowledge encoded in large language models(LLMs) and propose a method, namely AnomalyLLM. To align the dynamic graph with LLMs, AnomalyLLM pre-trains a dynamic-aware encoder to generate the representations of edges and reprograms the edges using the prototypes of word embeddings. Along with the encoder, we design an in-context learning framework that integrates the information of a few labeled samples to achieve few-shot anomaly detection. Experiments on four datasets reveal that AnomalyLLM can not only significantly improve the performance of few-shot anomaly detection, but also achieve superior results on new anomalies without any update of model parameters.
- [477] arXiv:2405.07627 [pdf, ps, other]
-
Title: End-to-End Delivery in LEO Mega-constellations and the Reordering ProblemRasmus Sibbern Frederiksen, Thomas Gundgaard Mulvad, Israel Leyva-Mayorga, Tatiana Kozlova Madsen, Federico ChiariottiComments: Submitted to IEEE PIMRC Workshops 2024Subjects: Networking and Internet Architecture (cs.NI)
Low Earth orbit (LEO) satellite mega-constellations with hundreds or thousands of satellites and inter-satellite links (ISLs) have the potential to provide global end-to-end connectivity. Furthermore, if the physical distance between source and destination is sufficiently long, end-to-end routing over the LEO constellation can provide lower latency when compared to the terrestrial infrastructure due to the faster propagation of electromagnetic waves in space than in optic fiber. However, the frequent route changes due to the movement of the satellites result in the out-of-order delivery of packets, causing sudden changes to the Round-Trip Time (RTT) that can be misinterpreted as congestion by congestion control algorithms. In this paper, the performance of three widely used congestion control algorithms, Cubic, Reno, and BBR, is evaluated in an emulated LEO satellite constellation with Free-Space Optical (FSO) ISLs. Furthermore, we perform a sensitivity analysis for Cubic by changing the satellite constellation parameters, length of the routes, and the positions of the source and destination to identify problematic routing scenarios. The results show that route changes can have profound transient effects on the goodput of the connection, posing problems for typical broadband applications.
- [478] arXiv:2405.07637 [pdf, ps, other]
-
Title: Near-Optimal Regret in Linear MDPs with Aggregate Bandit FeedbackSubjects: Machine Learning (cs.LG)
In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends. To this end, we study the recently proposed model of RL with Aggregate Bandit Feedback (RL-ABF), where the agent only observes the sum of rewards at the end of an episode instead of each reward individually. Prior work studied RL-ABF only in tabular settings, where the number of states is assumed to be small. In this paper, we extend ABF to linear function approximation and develop two efficient algorithms with near-optimal regret guarantees: a value-based optimistic algorithm built on a new randomization technique with a Q-functions ensemble, and a policy optimization algorithm that uses a novel hedging scheme over the ensemble.
- [479] arXiv:2405.07638 [pdf, ps, other]
-
Title: DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoSQingyang Li, Yihang Zhang, Zhidong Jia, Yannan Hu, Lei Zhang, Jianrong Zhang, Yongming Xu, Yong Cui, Zongming Guo, Xinggong ZhangSubjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces.
- [480] arXiv:2405.07640 [pdf, ps, other]
-
Title: Hyperparameter Importance Analysis for Multi-Objective AutoMLDaphne Theodorakopoulos (1 and 2), Frederic Stahl (1), Marius Lindauer (2 and 3) ((1) Marine Perception Research Department, German Research Center for Artificial Intelligence (DFKI), (2) Institute of Artificial Intelligence (LUH|AI), L3S Research Center, Leibniz University Hannover, (3) L3S Research Center)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Hyperparameter optimization plays a pivotal role in enhancing the predictive performance and generalization capabilities of ML models. However, in many applications, we do not only care about predictive performance but also about objectives such as inference time, memory, or energy consumption. In such MOO scenarios, determining the importance of hyperparameters poses a significant challenge due to the complex interplay between the conflicting objectives. In this paper, we propose the first method for assessing the importance of hyperparameters in the context of multi-objective hyperparameter optimization. Our approach leverages surrogate-based hyperparameter importance (HPI) measures, i.e. fANOVA and ablation paths, to provide insights into the impact of hyperparameters on the optimization objectives. Specifically, we compute the a-priori scalarization of the objectives and determine the importance of the hyperparameters for different objective tradeoffs. Through extensive empirical evaluations on diverse benchmark datasets with three different objectives paired with accuracy, namely time, demographic parity, and energy consumption, we demonstrate the effectiveness and robustness of our proposed method. Our findings not only offer valuable guidance for hyperparameter tuning in MOO tasks but also contribute to advancing the understanding of HPI in complex optimization scenarios.
- [481] arXiv:2405.07644 [pdf, ps, other]
-
Title: A Hessian-Based Field Deformer for Real-Time Topology-Aware Shape EditingYunxiao Zhang, Zixiong Wang, Zihan Zhao, Rui Xu, Shuangmin Chen, Shiqing Xin, Wenping Wang, Changhe TuComments: 10 pages, 18 figuresSubjects: Graphics (cs.GR)
Shape manipulation is a central research topic in computer graphics. Topology editing, such as breaking apart connections, joining disconnected ends, and filling/opening a topological hole, is generally more challenging than geometry editing. In this paper, we observe that the saddle points of the signed distance function (SDF) provide useful hints for altering surface topology deliberately. Based on this key observation, we parameterize the SDF into a cubic trivariate tensor-product B-spline function $F$ whose saddle points $\{\boldsymbol{s}_i\}$ can be quickly exhausted based on a subdivision-based root-finding technique coupled with Newton's method. Users can select one of the candidate points, say $\boldsymbol{s}_i$, to edit the topology in real time. In implementation, we add a compactly supported B-spline function rooted at $\boldsymbol{s}_i$, which we call a \textit{deformer} in this paper, to $F$, with its local coordinate system aligning with the three eigenvectors of the Hessian. Combined with ray marching technique, our interactive system operates at 30 FPS. Additionally, our system empowers users to create desired bulges or concavities on the surface. An extensive user study indicates that our system is user-friendly and intuitive to operate. We demonstrate the effectiveness and usefulness of our system in a range of applications, including fixing surface reconstruction errors, artistic work design, 3D medical imaging and simulation, and antiquity restoration. Please refer to the attached video for a demonstration.
- [482] arXiv:2405.07647 [pdf, ps, other]
-
Title: Fuzzy Logic Weight based Coordination Scheme for Utilizing Electric Vehicle Charging StationsSubjects: Emerging Technologies (cs.ET)
The larger battery capacities and the longer waiting and charging time of electric vehicles (EVs) results in low utilization of charging stations (CSs). This paper, proposes fuzzy logic weight based coordination (FLWC) scheme to enhance the utilization of CSs. Each EV has an associated uncertain information including stay time and the current state-of-charge (SoC). The fuzzy logic controller (FLC) analyze these inputs and determines a weight value. The proposed FLWC scheme then allocates CSs to the EVs according to the weight values. The proposed scheme is simulated for 100 EVs and 5 CSs using Matlab. The simulation result shows about 30% improvement in the average utilization of CSs as compared to first-come-first-serve (FCFS) based scheme.
- [483] arXiv:2405.07648 [pdf, ps, other]
-
Title: CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-ResolutionSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details. In this paper, we propose a novel BSR approach, Content-aware Degradation-driven Transformer (CDFormer), to capture both degradation and content representations. However, low-resolution images cannot provide enough content details, and thus we introduce a diffusion-based module $CDFormer_{diff}$ to first learn Content Degradation Prior (CDP) in both low- and high-resolution images, and then approximate the real distribution given only low-resolution information. Moreover, we apply an adaptive SR network $CDFormer_{SR}$ that effectively utilizes CDP to refine features. Compared to previous diffusion-based SR methods, we treat the diffusion model as an estimator that can overcome the limitations of expensive sampling time and excessive diversity. Experiments show that CDFormer can outperform existing methods, establishing a new state-of-the-art performance on various benchmarks under blind settings. Codes and models will be available at \href{this https URL}{this https URL}.
- [484] arXiv:2405.07652 [pdf, ps, other]
-
Title: G-VOILA: Gaze-Facilitated Information Querying in Daily ScenariosComments: 25 pages, 12 figuresSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems.
- [485] arXiv:2405.07653 [pdf, ps, other]
-
Title: Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance KeyingComments: 32. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision'2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Deep Neural Networks (DNNs) require large amounts of annotated training data for a good performance. Often this data is generated using manual labeling (error-prone and time-consuming) or rendering (requiring geometry and material information). Both approaches make it difficult or uneconomic to apply them to many small-scale applications. A fast and straightforward approach of acquiring the necessary training data would allow the adoption of deep learning to even the smallest of applications. Chroma keying is the process of replacing a color (usually blue or green) with another background. Instead of chroma keying, we propose luminance keying for fast and straightforward training image acquisition. We deploy a black screen with high light absorption (99.99\%) to record roughly 1-minute long videos of our target objects, circumventing typical problems of chroma keying, such as color bleeding or color overlap between background color and object color. Next we automatically mask our objects using simple brightness thresholding, saving the need for manual annotation. Finally, we automatically place the objects on random backgrounds and train a 2D object detector. We do extensive evaluation of the performance on the widely-used YCB-V object set and compare favourably to other conventional techniques such as rendering, without needing 3D meshes, materials or any other information of our target objects and in a fraction of the time needed for other approaches. Our work demonstrates highly accurate training data acquisition allowing to start training state-of-the-art networks within minutes.
- [486] arXiv:2405.07655 [pdf, ps, other]
-
Title: Quality-aware Selective Fusion Network for V-D-T Salient Object DetectionLiuxin Bao, Xiaofei Zhou, Xiankai Lu, Yaoqi Sun, Haibing Yin, Zhenghui Hu, Jiyong Zhang, Chenggang YanComments: Accepted by IEEE Transactions on Image Processing (TIP)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers pay attention to the triple-modal SOD task, where they attempt to explore the complementarity of the RGB image, the depth image, and the thermal image. However, existing triple-modal SOD methods fail to perceive the quality of depth maps and thermal images, which leads to performance degradation when dealing with scenes with low-quality depth and thermal images. Therefore, we propose a quality-aware selective fusion network (QSF-Net) to conduct VDT salient object detection, which contains three subnets including the initial feature extraction subnet, the quality-aware region selection subnet, and the region-guided selective fusion subnet. Firstly, except for extracting features, the initial feature extraction subnet can generate a preliminary prediction map from each modality via a shrinkage pyramid architecture. Then, we design the weakly-supervised quality-aware region selection subnet to generate the quality-aware maps. Concretely, we first find the high-quality and low-quality regions by using the preliminary predictions, which further constitute the pseudo label that can be used to train this subnet. Finally, the region-guided selective fusion subnet purifies the initial features under the guidance of the quality-aware maps, and then fuses the triple-modal features and refines the edge details of prediction maps through the intra-modality and inter-modality attention (IIA) module and the edge refinement (ER) module, respectively. Extensive experiments are performed on VDT-2048
- [487] arXiv:2405.07656 [pdf, ps, other]
-
Title: Non-Rigid Designators in Modal and Temporal Free Description Logics (Extended Version)Subjects: Logic in Computer Science (cs.LO)
Definite descriptions, such as 'the General Chair of KR 2024', are a semantically transparent device for object identification in knowledge representation. In first-order modal logic, definite descriptions have been widely investigated for their non-rigidity, which allows them to designate different objects (or none at all) at different states. We propose expressive modal description logics with non-rigid definite descriptions and names, and investigate decidability and complexity of the satisfaction problem. We first systematically link satisfiability for the one-variable fragment of first-order modal logic with counting to our modal description logics. Then, we prove a promising NEXPTIME-completeness result for concept satisfiability for the fundamental epistemic multi-agent logic $\mathbf{S5}^{n}$ and its neighbours, and show that some expressive logics that are undecidable with constant domain become decidable (but Ackermann-hard) with expanding domains. Finally, we conduct a fine-grained analysis of decidability of temporal logics.
- [488] arXiv:2405.07658 [pdf, ps, other]
-
Title: Understanding Data Understanding: A Framework to Navigate the Intricacies of Data AnalyticsComments: Accepted at 32nd European Conference on Information Systems (2024)Subjects: Human-Computer Interaction (cs.HC)
As organizations face the challenges of processing exponentially growing data volumes, their reliance on analytics to unlock value from this data has intensified. However, the intricacies of big data, such as its extensive feature sets, pose significant challenges. A crucial step in leveraging this data for insightful analysis is an in-depth understanding of both the data and its domain. Yet, existing literature presents a fragmented picture of what comprises an effective understanding of data and domain, varying significantly in depth and focus. To address this research gap, we conduct a systematic literature review, aiming to delineate the dimensions of data understanding. We identify five dimensions: Foundations, Collection & Selection, Contextualization & Integration, Exploration & Discovery, and Insights. These dimensions collectively form a comprehensive framework for data understanding, providing guidance for organizations seeking meaningful insights from complex datasets. This study synthesizes the current state of knowledge and lays the groundwork for further exploration.
- [489] arXiv:2405.07662 [pdf, ps, other]
-
Title: Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification ApplicationsComments: ICLR 2024 Workshop on Practical ML for Low Resource SettingsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Many industry verticals are confronted with small-sized tabular data. In this low-data regime, it is currently unclear whether the best performance can be expected from simple baselines, or more complex machine learning approaches that leverage meta-learning and ensembling. On 44 tabular classification datasets with sample sizes $\leq$ 500, we find that L2-regularized logistic regression performs similar to state-of-the-art automated machine learning (AutoML) frameworks (AutoPrognosis, AutoGluon) and off-the-shelf deep neural networks (TabPFN, HyperFast) on the majority of the benchmark datasets. We therefore recommend to consider logistic regression as the first choice for data-scarce applications with tabular data and provide practitioners with best practices for further method selection.
- [490] arXiv:2405.07663 [pdf, ps, other]
-
Title: Sign Stitching: A Novel Approach to Sign Language ProductionComments: 18 pages, 3 figures, 4 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Sign Language Production (SLP) is a challenging task, given the limited resources available and the inherent diversity within sign data. As a result, previous works have suffered from the problem of regression to the mean, leading to under-articulated and incomprehensible signing. In this paper, we propose using dictionary examples and a learnt codebook of facial expressions to create expressive sign language sequences. However, simply concatenating signs and adding the face creates robotic and unnatural sequences. To address this we present a 7-step approach to effectively stitch sequences together. First, by normalizing each sign into a canonical pose, cropping, and stitching we create a continuous sequence. Then, by applying filtering in the frequency domain and resampling each sign, we create cohesive natural sequences that mimic the prosody found in the original data. We leverage a SignGAN model to map the output to a photo-realistic signer and present a complete Text-to-Sign (T2S) SLP pipeline. Our evaluation demonstrates the effectiveness of the approach, showcasing state-of-the-art performance across all datasets. Finally, a user evaluation shows our approach outperforms the baseline model and is capable of producing realistic sign language sequences.
- [491] arXiv:2405.07664 [pdf, ps, other]
-
Title: Geospatial Knowledge GraphsSubjects: Artificial Intelligence (cs.AI)
Geospatial knowledge graphs have emerged as a novel paradigm for representing and reasoning over geospatial information. In this framework, entities such as places, people, events, and observations are depicted as nodes, while their relationships are represented as edges. This graph-based data format lays the foundation for creating a "FAIR" (Findable, Accessible, Interoperable, and Reusable) environment, facilitating the management and analysis of geographic information. This entry first introduces key concepts in knowledge graphs along with their associated standardization and tools. It then delves into the application of knowledge graphs in geography and environmental sciences, emphasizing their role in bridging symbolic and subsymbolic GeoAI to address cross-disciplinary geospatial challenges. At the end, new research directions related to geospatial knowledge graphs are outlined.
- [492] arXiv:2405.07665 [pdf, ps, html, other]
-
Title: Partial information decomposition as information bottleneckSubjects: Information Theory (cs.IT); Machine Learning (stat.ML)
The partial information decomposition (PID) aims to quantify the amount of redundant information that a set of sources provide about a target. Here we show that this goal can be formulated as a type of information bottleneck (IB) problem, which we term the "redundancy bottleneck" (RB). The RB formalizes a tradeoff between prediction and compression: it extracts information from the sources that predicts the target, without revealing which source provided the information. It can be understood as a generalization "Blackwell redundancy", which we previously proposed as a principled measure of PID redundancy. The "RB curve" quantifies the prediction/compression tradeoff at multiple scales. This curve can also be quantified for individual sources, allowing subsets of redundant sources to be identified without combinatorial optimization. We provide an efficient iterative algorithm for computing the RB curve.
- [493] arXiv:2405.07666 [pdf, ps, other]
-
Title: New Solutions to Delsarte's Dual Linear ProgramsSubjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)
Understanding the maximum size of a code with a given minimum distance is a major question in computer science and discrete mathematics. The most fruitful approach for finding asymptotic bounds on such codes is by using Delsarte's theory of association schemes. With this approach, Delsarte constructs a linear program such that its maximum value is an upper bound on the maximum size of a code with a given minimum distance. Bounding this value can be done by finding solutions to the corresponding dual linear program. Delsarte's theory is very general and goes way beyond binary codes.
In this work, we provide universal bounds in the framework of association schemes that generalize the Hamming bound and the Elias-Bassalygo bound, which can be applied to any association scheme constructed from a distance function. These bounds are obtained by constructing new solutions to Delsarte's dual linear program. We instantiate these results and we recover known bounds for $q$-ary codes and for constant-weight binary codes but which didn't come from the linear program method. Our other contribution is to recover, for essentially any $Q$-polynomial scheme, MRRW-type solutions to Delsarte's dual linear program which are inspired by the Laplacian approach of Friedman and Tillich instead of using the Christoffel-Darboux formulas. We show in particular how the second linear programming bound can be interpreted in this framework. - [494] arXiv:2405.07667 [pdf, ps, other]
-
Title: Backdoor Removal for Generative Large Language ModelsSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
With rapid advances, generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning. Yet, language models' inherent vulnerabilities may be exacerbated due to increased accessibility and unrestricted model training on massive textual data from the Internet. A malicious adversary may publish poisoned data online and conduct backdoor attacks on the victim LLMs pre-trained on the poisoned data. Backdoored LLMs behave innocuously for normal queries and generate harmful responses when the backdoor trigger is activated. Despite significant efforts paid to LLMs' safety issues, LLMs are still struggling against backdoor attacks. As Anthropic recently revealed, existing safety training strategies, including supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), fail to revoke the backdoors once the LLM is backdoored during the pre-training stage. In this paper, we present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs. We initially propose Overwrite Supervised Fine-tuning (OSFT) for effective backdoor removal when the trigger is known. Then, to handle the scenarios where the trigger patterns are unknown, we integrate OSFT into our two-stage framework, SANDE. Unlike previous works that center on the identification of backdoors, our safety-enhanced LLMs are able to behave normally even when the exact triggers are activated. We conduct comprehensive experiments to show that our proposed SANDE is effective against backdoor attacks while bringing minimal harm to LLMs' powerful capability without any additional access to unbackdoored clean models. We will release the reproducible code.
- [495] arXiv:2405.07668 [pdf, ps, other]
-
Title: CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning ModelsComments: 23 pages, 2 figures, accepted by FSE 2024 (The ACM International Conference on the Foundations of Software Engineering)Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, existing certified detection defenders suffer from protecting labels subject to manipulation, and existing certified recovery defenders cannot systematically warn samples about their labels. A certified defense that simultaneously offers robust labels and systematic warning protection against patch attacks is desirable. This paper proposes a novel certified defense technique called CrossCert. CrossCert formulates a novel approach by cross-checking two certified recovery defenders to provide unwavering certification and detection certification. Unwavering certification ensures that a certified sample, when subjected to a patched perturbation, will always be returned with a benign label without triggering any warnings with a provable guarantee. To our knowledge, CrossCert is the first certified detection technique to offer this guarantee. Our experiments show that, with a slightly lower performance than ViP and comparable performance with PatchCensor in terms of detection certification, CrossCert certifies a significant proportion of samples with the guarantee of unwavering certification.
- [496] arXiv:2405.07670 [pdf, ps, other]
-
Title: Impact of white Gaussian internal noise on analog echo-state neural networksComments: 10 pages 8 figuresSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
In recent years, more and more works have appeared devoted to the analog (hardware) implementation of artificial neural networks, in which neurons and the connection between them are based not on computer calculations, but on physical principles. Such networks offer improved energy efficiency and, in some cases, scalability, but may be susceptible to internal noise. This paper studies the influence of noise on the functioning of recurrent networks using the example of trained echo state networks (ESNs). The most common reservoir connection matrices were chosen as various topologies of ESNs: random uniform and band matrices with different connectivity. White Gaussian noise was chosen as the influence, and according to the way of its introducing it was additive or multiplicative, as well as correlated or uncorrelated. In the paper, we show that the propagation of noise in reservoir is mainly controlled by the statistical properties of the output connection matrix, namely the mean and the mean square. Depending on these values, more correlated or uncorrelated noise accumulates in the network. We also show that there are conditions under which even noise with an intensity of $10^{-20}$ is already enough to completely lose the useful signal. In the article we show which types of noise are most critical for networks with different activation functions (hyperbolic tangent, sigmoid and linear) and if the network is self-closed.
- [497] arXiv:2405.07671 [pdf, ps, other]
-
Title: Constructing a BPE Tokenization DFASubjects: Formal Languages and Automata Theory (cs.FL); Computation and Language (cs.CL); Machine Learning (cs.LG)
Many natural language processing systems operate over tokenizations of text to address the open-vocabulary problem. In this paper, we give and analyze an algorithm for the efficient construction of deterministic finite automata designed to operate directly on tokenizations produced by the popular byte pair encoding technique. This makes it possible to apply many existing techniques and algorithms to the tokenized case, such as pattern matching, equivalence checking of tokenization dictionaries, and composing tokenized languages in various ways.
- [498] arXiv:2405.07673 [pdf, ps, other]
-
Title: An Empirical Study on the Robustness of Massively Multilingual Neural Machine TranslationComments: 12 pages, 6 figuresSubjects: Computation and Language (cs.CL)
Massively multilingual neural machine translation (MMNMT) has been proven to enhance the translation quality of low-resource languages. In this paper, we empirically investigate the translation robustness of Indonesian-Chinese translation in the face of various naturally occurring noise. To assess this, we create a robustness evaluation benchmark dataset for Indonesian-Chinese translation. This dataset is automatically translated into Chinese using four NLLB-200 models of different sizes. We conduct both automatic and human evaluations. Our in-depth analysis reveal the correlations between translation error types and the types of noise present, how these correlations change across different model sizes, and the relationships between automatic evaluation indicators and human evaluation indicators. The dataset is publicly available at this https URL.
- [499] arXiv:2405.07679 [pdf, ps, other]
-
Title: Class-wise Activation Unravelling the Engima of Deep Double DescentComments: arXiv admin note: text overlap with arXiv:2310.13572Subjects: Machine Learning (cs.LG)
Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon in specific contexts, an accepted theory for its occurring mechanism in deep learning remains yet to be established. In this study, we revisited the phenomenon of double descent and discussed the conditions of its occurrence. This paper introduces the concept of class-activation matrices and a methodology for estimating the effective complexity of functions, on which we unveil that over-parameterized models exhibit more distinct and simpler class patterns in hidden activations compared to under-parameterized ones. We further looked into the interpolation of noisy labelled data among clean representations and demonstrated overfitting w.r.t. expressive capacity. By comprehensively analysing hypotheses and presenting corresponding empirical evidence that either validates or contradicts these hypotheses, we aim to provide fresh insights into the phenomenon of double descent and benign over-parameterization and facilitate future explorations. By comprehensively studying different hypotheses and the corresponding empirical evidence either supports or challenges these hypotheses, our goal is to offer new insights into the phenomena of double descent and benign over-parameterization, thereby enabling further explorations in the field. The source code is available at this https URL.
- [500] arXiv:2405.07680 [pdf, ps, other]
-
Title: Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of MetricsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The development of generative artificial intelligence for human motion generation has expanded rapidly, necessitating a unified evaluation framework. This paper presents a detailed review of eight evaluation metrics for human motion generation, highlighting their unique features and shortcomings. We propose standardized practices through a unified evaluation setup to facilitate consistent model comparisons. Additionally, we introduce a novel metric that assesses diversity in temporal distortion by analyzing warping diversity, thereby enhancing the evaluation of temporal data. We also conduct experimental analyses of three generative models using a publicly available dataset, offering insights into the interpretation of each metric in specific case scenarios. Our goal is to offer a clear, user-friendly evaluation framework for newcomers, complemented by publicly accessible code.
- [501] arXiv:2405.07682 [pdf, ps, other]
-
Title: FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment GenerationComments: IJCAI 2024Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Singing Accompaniment Generation (SAG), which generates instrumental music to accompany input vocals, is crucial to developing human-AI symbiotic art creation systems. The state-of-the-art method, SingSong, utilizes a multi-stage autoregressive (AR) model for SAG, however, this method is extremely slow as it generates semantic and acoustic tokens recursively, and this makes it impossible for real-time applications. In this paper, we aim to develop a Fast SAG method that can create high-quality and coherent accompaniments. A non-AR diffusion-based framework is developed, which by carefully designing the conditions inferred from the vocal signals, generates the Mel spectrogram of the target accompaniment directly. With diffusion and Mel spectrogram modeling, the proposed method significantly simplifies the AR token-based SingSong framework, and largely accelerates the generation. We also design semantic projection, prior projection blocks as well as a set of loss functions, to ensure the generated accompaniment has semantic and rhythm coherence with the vocal signal. By intensive experimental studies, we demonstrate that the proposed method can generate better samples than SingSong, and accelerate the generation by at least 30 times. Audio samples and code are available at this https URL.
- [502] arXiv:2405.07685 [pdf, ps, other]
-
Title: Comprehensive Analysis of Access Control Models in Edge Computing: Challenges, Solutions, and Future DirectionsSubjects: Systems and Control (eess.SY)
Many contemporary applications, including smart homes and autonomous vehicles, rely on the Internet of Things technology. While cloud computing provides a multitude of valuable services for these applications, it generally imposes constraints on latency-sensitive applications due to the significant propagation delays. As a complementary technique to cloud computing, edge computing situates computing resources closer to the data sources, which reduces the latency and simultaneously alleviates the bandwidth pressure for the cloud and enhances data security. While edge computing offers significant advantages, it also presents significant challenges in access control -- a critical component for safeguarding data. For instance, it is crucial to implement access control mechanisms that are both effective and efficient on resource-constrained devices, ensuring high security without compromising the inherent low latency benefits of edge computing. These challenges drive the development of innovative access control solutions tailored to meet the unique requirements of edge computing environments. We classify related references from the perspectives of multiple data lifecycles (including data collection, storage, and usage), which thoroughly investigates the access control techniques and helps readers understand them systematically. Finally, we reflect on the classification and envisage future research directions.
- [503] arXiv:2405.07687 [pdf, ps, other]
-
Title: Highly Efficient Observation Process based on FFT Filtering for Robot Swarm Collaborative Navigation in Unknown EnvironmentsComments: 8 pages, 8 figures, 1 tableSubjects: Robotics (cs.RO)
Collaborative path planning for robot swarms in complex, unknown environments without external positioning is a challenging problem. This requires robots to find safe directions based on real-time environmental observations, and to efficiently transfer and fuse these observations within the swarm. This study presents a filtering method based on Fast Fourier Transform (FFT) to address these two issues. We treat sensors' environmental observations as a digital sampling process. Then, we design two different types of filters for safe direction extraction, as well as for the compression and reconstruction of environmental data. The reconstructed data is mapped to probabilistic domain, achieving efficient fusion of swarm observations and planning decision. The computation time is only on the order of microseconds, and the transmission data in communication systems is in bit-level. The performance of our algorithm in sensor data processing was validated in real world experiments, and the effectiveness in swarm path optimization was demonstrated through extensive simulations.
- [504] arXiv:2405.07689 [pdf, ps, other]
-
Title: Quality of Experience Optimization for Real-time XR Video Transmission with Energy ConstraintsComments: 6 pages, 5 figuresSubjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Extended Reality (XR) is an important service in the 5G network and in future 6G networks. In contrast to traditional video on demand services, real-time XR video is transmitted frame-by-frame, requiring low latency and being highly sensitive to network fluctuations. In this paper, we model the quality of experience (QoE) for real-time XR video transmission on a frame-by-frame basis. Based on the proposed QoE model, we formulate an optimization problem that maximizes QoE with constraints on wireless resources and long-term energy consumption. We utilize Lyapunov optimization to transform the original problem into a single-frame optimization problem and then allocate wireless subchannels. We propose an adaptive XR video bitrate algorithm that employs a Long Short Term Memory (LSTM) based Deep Q-Network (DQN) algorithm for video bitrate selection. Through numerical results, we show that our proposed algorithm outperforms the baseline algorithms, with the average QoE improvements of 0.04 to 0.46. Specifically, compared to baseline algorithms, the proposed algorithm reduces average video quality variations by 29% to 50% and improves the frame transmission success rate by 5% to 48%.
- [505] arXiv:2405.07690 [pdf, ps, other]
-
Title: Convergence analysis of three semi-discrete numerical schemes for nonlocal geometric flows including perimeter termsComments: 34 pages, 9 figuresSubjects: Numerical Analysis (math.NA)
We present and analyze three distinct semi-discrete schemes for solving nonlocal geometric flows incorporating perimeter terms. These schemes are based on the finite difference method, the finite element method, and the finite element method with a specific tangential motion. We offer rigorous proofs of quadratic convergence under $H^1$-norm for the first scheme and linear convergence under $H^1$-norm for the latter two schemes. All error estimates rely on the observation that the error of the nonlocal term can be controlled by the error of the local term. Furthermore, we explore the relationship between the convergence under $L^\infty$-norm and manifold distance. Extensive numerical experiments are conducted to verify the convergence analysis, and demonstrate the accuracy of our schemes under various norms for different types of nonlocal flows.
- [506] arXiv:2405.07696 [pdf, ps, other]
-
Title: MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked AutoencodersSubjects: Computer Vision and Pattern Recognition (cs.CV)
Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimensions, depths, and orientations. We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders that addresses the object occlusion issue by masking and reconstructing objects in the feature space. MonoMAE consists of two novel designs. The first is depth-aware masking that selectively masks certain parts of non-occluded object queries in the feature space for simulating occluded object queries for network training. It masks non-occluded object queries by balancing the masked and preserved query portions adaptively according to the depth information. The second is lightweight query completion that works with the depth-aware masking to learn to reconstruct and complete the masked object queries. With the proposed object occlusion and completion, MonoMAE learns enriched 3D representations that achieve superior monocular 3D detection performance qualitatively and quantitatively for both occluded and non-occluded objects. Additionally, MonoMAE learns generalizable representations that can work well in new domains.
- [507] arXiv:2405.07697 [pdf, ps, html, other]
-
Title: Practical Short-Length Coding Schemes for Binary Distributed Hypothesis TestingComments: Accepted at ISIT 2024Subjects: Information Theory (cs.IT)
This paper investigates practical coding schemes for Distributed Hypothesis Testing (DHT). While the literature has extensively analyzed the information-theoretic performance of DHT and established bounds on Type-II error exponents through quantize and quantize-binning achievability schemes, the practical implementation of DHT coding schemes has not yet been investigated. Therefore, this paper introduces practical implementations of quantizers and quantize-binning schemes for DHT, leveraging short-length binary linear block codes. Furthermore, it provides exact analytical expressions for Type-I and Type-II error probabilities associated with each proposed coding scheme. Numerical results show the accuracy of the proposed analytical error probability expressions, and enable to compare the performance of the proposed schemes.
- [508] arXiv:2405.07698 [pdf, ps, html, other]
-
Title: oTTC: Object Time-to-Contact for Motion Estimation in Autonomous DrivingComments: 9 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D object detectors try to solve this problem by directly predicting 3D bounding boxes and object velocities given a camera image. Recent research estimates time-to-contact in a per-pixel manner and suggests that it is more effective measure than velocity and depth combined. However, per-pixel time-to-contact requires object detection to serve its purpose effectively and hence increases overall computational requirements as two different models need to run. To address this issue, we propose per-object time-to-contact estimation by extending object detection models to additionally predict the time-to-contact attribute for each object. We compare our proposed approach with existing time-to-contact methods and provide benchmarking results on well-known datasets. Our proposed approach achieves higher precision compared to prior art while using a single image.
- [509] arXiv:2405.07700 [pdf, ps, other]
-
Title: Age-Dependent Analysis and Stochastic Generation of Child-Directed SpeechComments: Accepted for publication in Proc. 45th Annual Meeting of the Cognitive Science Society (CogSci-2024)Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Child-directed speech (CDS) is a particular type of speech that adults use when addressing young children. Its properties also change as a function of extralinguistic factors, such as age of the child being addressed. Access to large amounts of representative and varied CDS would be useful for child language research, as this would enable controlled computational modeling experiments of infant language acquisition with realistic input in terms of quality and quantity. In this study, we describe an approach to model age-dependent linguistic properties of CDS using a language model (LM) trained on CDS transcripts and ages of the recipient children, as obtained from North American English corpora of the CHILDES database. The created LM can then be used to stochastically generate synthetic CDS transcripts in an age-appropriate manner, thereby scaling beyond the original datasets in size. We compare characteristics of the generated CDS against the real speech addressed at children of different ages, showing that the LM manages to capture age-dependent changes in CDS, except for a slight difference in the effective vocabulary size. As a side product, we also provide a systematic characterization of age-dependent linguistic properties of CDS in CHILDES, illustrating how all measured aspects of the CDS change with children's age.
- [510] arXiv:2405.07702 [pdf, ps, other]
-
Title: FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer SurvivalSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Integrating the different data modalities of cancer patients can significantly improve the predictive performance of patient survival. However, most existing methods ignore the simultaneous utilization of rich semantic features at different scales in pathology images. When collecting multimodal data and extracting features, there is a likelihood of encountering intra-modality missing data, introducing noise into the multimodal data. To address these challenges, this paper proposes a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information. Specifically, the cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis through a cross-scale feature cross-fusion method. This enhances the ability of pathological image feature representation. Secondly, the hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features and local detail features of the molecular data. HAE's channel attention module obtains global features of molecular data. Furthermore, to address the issue of missing information within modalities, we propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities. Extensive experiments demonstrate the superiority of our method over state-of-the-art methods on four benchmark datasets in both complete and missing settings.
- [511] arXiv:2405.07703 [pdf, ps, other]
-
Title: OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs trained starting from Llama 2Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian RebedeaSubjects: Computation and Language (cs.CL)
In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specialized for Romanian.
- [512] arXiv:2405.07708 [pdf, ps, other]
-
Title: Secure Aggregation Meets Sparsification in Decentralized LearningSubjects: Machine Learning (cs.LG)
Decentralized learning (DL) faces increased vulnerability to privacy breaches due to sophisticated attacks on machine learning (ML) models. Secure aggregation is a computationally efficient cryptographic technique that enables multiple parties to compute an aggregate of their private data while keeping their individual inputs concealed from each other and from any central aggregator. To enhance communication efficiency in DL, sparsification techniques are used, selectively sharing only the most crucial parameters or gradients in a model, thereby maintaining efficiency without notably compromising accuracy. However, applying secure aggregation to sparsified models in DL is challenging due to the transmission of disjoint parameter sets by distinct nodes, which can prevent masks from canceling out effectively. This paper introduces CESAR, a novel secure aggregation protocol for DL designed to be compatible with existing sparsification mechanisms. CESAR provably defends against honest-but-curious adversaries and can be formally adapted to counteract collusion between them. We provide a foundational understanding of the interaction between the sparsification carried out by the nodes and the proportion of the parameters shared under CESAR in both colluding and non-colluding environments, offering analytical insight into the working and applicability of the protocol. Experiments on a network with 48 nodes in a 3-regular topology show that with random subsampling, CESAR is always within 0.5% accuracy of decentralized parallel stochastic gradient descent (D-PSGD), while adding only 11% of data overhead. Moreover, it surpasses the accuracy on TopK by up to 0.3% on independent and identically distributed (IID) data.
- [513] arXiv:2405.07710 [pdf, ps, other]
-
Title: Waste Factor and Waste Figure: A Unified Theory for Modeling and Analyzing Wasted Power in Radio Access Networks for Improved SustainabilityComments: 26 pages, 21 figures, 5 tablesSubjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
This paper introduces Waste Factor (W), also denoted as Waste Figure (WF) in dB, a promising new metric for quantifying energy efficiency in a wide range of circuits and systems applications, including data centers and RANs. Also, the networks used to connect data centers and AI computing engines with users for ML applications must become more power efficient. This paper illustrates the limitations of existing energy efficiency metrics that inadequately capture the intricate energy dynamics of RAN components. We delineate the methodology for applying W across various network configurations, including MISO, SIMO, and MIMO systems, and demonstrate the effectiveness of W in identifying energy optimization opportunities. Our findings reveal that W not only offers nuanced insights into the energy performance of RANs but also facilitates informed decision-making for network design and operational efficiency. Furthermore, we show how W can be integrated with other KPIs to guide the development of optimal strategies for enhancing network energy efficiency under different operational conditions. Additionally, we present simulation results for a distributed multi-user MIMO system at 3.5, 17, and 28 GHz, demonstrating overall network power efficiency on a per square kilometer basis, and show how overall W decreases with an increasing number of base stations and increasing carrier frequency. This paper shows that adopting W as a figure of merit can significantly contribute to the sustainability and energy optimization of next-generation wireless communication networks, paving the way for greener and more sustainable, energy-efficient 5G and 6G technologies.
- [514] arXiv:2405.07714 [pdf, ps, other]
-
Title: Joint Robotic Aerial Base Station Deployment and Wireless Backhauling in 6G Multi-hop NetworksComments: 6 pages conference and 6 figuresSubjects: Networking and Internet Architecture (cs.NI)
Due to their ability to anchor into tall urban landforms, such as lampposts or street lights, robotic aerial base stations (RABSs) can create a hyper-flexible wireless multi-hop heterogeneous network to meet the forthcoming green, densified, and dynamic network deployment to support, inter alia, high data rates. In this work, we propose a network infrastructure that can concurrently support the wireless backhaul link capacity and access link traffic demand in the millimeter-wave (mmWave) frequency band. The RABSs grasping locations, resource blocks (RBs) assignment, and route flow control are simultaneously optimized to maximize the served traffic demands. Robotic base stations capitalize on the fact that traffic distribution varies considerably across both time and space within a given geographical area. Hence, they are able to relocate to suitable locations, i.e., 'follow' the traffic demand as it unfolds to increase the overall network efficiency. To tackle the curse of dimensionality of the proposed mixed-integer linear problem, we propose a greedy algorithm to obtain a competitive solution with low computational complexity. Compared to baseline models, which are heterogeneous networks with randomly deployed fixed small cells and pre-allocated RBs for wireless access and backhaul links, a wide set of numerical investigations reveals that robotic base stations could improve the served traffic demand. Specifically, the proposed mode serves at most 65\% more traffic demand compared to an equal number of deployed fixed small cells.
- [515] arXiv:2405.07715 [pdf, ps, other]
-
Title: Evidence of What, for Whom? The Socially Contested Role of Algorithmic Bias in a Predictive Policing ToolComments: Accepted to FAccT '24Subjects: Computers and Society (cs.CY)
This paper presents a critical, qualitative study of the social role of algorithmic bias in the context of the Chicago crime prediction algorithm, a predictive policing tool that forecasts when and where in the city crime is most likely to occur. Through interviews with 18 Chicago-area community organizations, academic researchers, and public sector actors, we show that stakeholders from different groups articulate diverse problem diagnoses of the tool's algorithmic bias, strategically using it as evidence to advance criminal justice interventions that align with stakeholders' positionality and political ends. Drawing inspiration from Catherine D'Ignazio's taxonomy of "refusing and using" data, we find that stakeholders use evidence of algorithmic bias to reform the policies around police patrol allocation; reject algorithm-based policing interventions; reframe crime as a structural rather than interpersonal problem; reveal data on authority figures in an effort to subvert their power; repair and heal families and communities; and, in the case of more powerful actors, to reaffirm their own authority or existing power structures. We identify the implicit assumptions and scope of these varied uses of algorithmic bias as evidence, showing that they require different (and sometimes conflicting) values about policing and AI. This divergence reflects long-standing tensions in the criminal justice reform landscape between the values of liberation and healing often centered by system-impacted communities and the values of surveillance and deterrence often instantiated in data-driven reform measures. We advocate for centering the interests and experiential knowledge of communities impacted by incarceration to ensure that evidence of algorithmic bias can serve as a device to challenge the status quo.
- [516] arXiv:2405.07718 [pdf, ps, html, other]
-
Title: Contract-Based Design for Hybrid Dynamical Systems and Invariance PropertiesComments: Accepted ADHS 2024Subjects: Systems and Control (eess.SY)
This work establishes fundamental principles for verifying contract for interconnected hybrid systems. When system's hybrid arcs conform to the contract for a certain duration but subsequently violate it, the composition of hybrid dynamical systems becomes challenging. The objective of this work is to analyze the temporal satisfaction of the contract, allowing us to reason about the compositions that do not violate the contract up to a certain point in a hybrid time. Notions of weak and strong satisfaction of an assume-guarantee contract are introduced. These semantics permits the compositional reasoning on hybrid systems of varying complexity depending on the interconnection's type, feedback or cascade. The results show that both semantics are compatible with cascade composition, while strong semantic is required for feedback composition. Moreover, we have shown how one can go from weak to strong contract satisfaction. Finally, we have studied a particular class of hybrid systems and we have shown that the concept of forward (pre-)invariant relative to a contract makes it possible to deal with feedback compositions. These results are demonstrated throughout the paper with simple numerical examples.
- [517] arXiv:2405.07719 [pdf, ps, other]
-
Title: A Unified Sequence Parallelism Approach for Long Context Generative AIComments: 12 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Sequence parallelism (SP), which divides the sequence dimension of input tensors across multiple computational devices, is becoming key to unlocking the long-context capabilities of generative AI models. This paper investigates the state-of-the-art SP approaches, i.e. DeepSpeed-Ulysses and Ring-Attention, and proposes a unified SP approach, which is more robust to transformer model architectures and network hardware topology. This paper compares the communication and memory cost of SP and existing parallelism, including data/tensor/zero/expert/pipeline parallelism, and discusses the best practices for designing hybrid 4D parallelism involving SP. We achieved 86\% MFU on two 8xA800 nodes using SP for sequence length 208K for the LLAMA3-8B model. Our code is publicly available on \url{this https URL}.
- [518] arXiv:2405.07723 [pdf, ps, other]
-
Title: Coarse or Fine? Recognising Action End States without LabelsComments: The Eleventh Workshop on Fine-Grained Visual Categorization (CVPR 24)Subjects: Computer Vision and Pattern Recognition (cs.CV)
We focus on the problem of recognising the end state of an action in an image, which is critical for understanding what action is performed and in which manner. We study this focusing on the task of predicting the coarseness of a cut, i.e., deciding whether an object was cut "coarsely" or "finely". No dataset with these annotated end states is available, so we propose an augmentation method to synthesise training data. We apply this method to cutting actions extracted from an existing action recognition dataset. Our method is object agnostic, i.e., it presupposes the location of the object but not its identity. Starting from less than a hundred images of a whole object, we can generate several thousands images simulating visually diverse cuts of different coarseness. We use our synthetic data to train a model based on UNet and test it on real images showing coarsely/finely cut objects. Results demonstrate that the model successfully recognises the end state of the cutting action despite the domain gap between training and testing, and that the model generalises well to unseen objects.
- [519] arXiv:2405.07725 [pdf, ps, other]
-
Title: Decentralized Distributed Graph Coloring: Cluster GraphsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Graph coloring is fundamental to distributed computing. We give an ultrafast distributed algorithm for coloring cluster graphs. These graphs are obtained from the underlying communication network by contracting nodes and edges, and they appear frequently as components in the study of distributed algorithms. In particular, we give a $O(\log^* n)$-round algorithm to $\Delta+1$-color cluster graphs of at least polylogarithmic degree. The previous best bound known was $poly(\log n)$ [Flin this http URL, SODA'24]. This properly generalizes results in the COONGEST model and shows that distributed graph problems can be quickly solved even when the node itself is decentralized.
- [520] arXiv:2405.07726 [pdf, ps, other]
-
Title: Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playingSubjects: Computation and Language (cs.CL)
Persona-driven role-playing (PRP) aims to build AI characters that can respond to user queries by faithfully sticking with all persona statements. Unfortunately, existing faithfulness criteria for PRP are limited to coarse-grained LLM-based scoring without a clear definition or formulation. This paper presents a pioneering exploration to quantify PRP faithfulness as a fine-grained and explainable criterion, which also serves as a reliable reference for optimization. Our criterion first discriminates persona statements into active and passive constraints by identifying the query-statement relevance. Then, we incorporate all constraints following the principle that the AI character's response should be (a) entailed by active (relevant) constraints and (b) not contradicted by passive (irrelevant) constraints. We translate this principle mathematically into a novel Active-Passive-Constraint (APC) score, a constraint-wise sum of natural language inference (NLI) scores weighted by relevance scores. In practice, we build the APC scoring system by symbolically distilling small discriminators from GPT-4 for efficiency. We validate the quality of the APC score against human evaluation based on example personas with tens of statements, and the results show a high correlation. We further leverage it as a reward system in direct preference optimization (DPO) for better AI characters. Our experiments offer a fine-grained and explainable comparison between existing PRP techniques, revealing their advantages and limitations. We further find APC-based DPO to be one of the most competitive techniques for sticking with all constraints and can be well incorporated with other techniques. We then extend the scale of the experiments to real persons with hundreds of statements and reach a consistent conclusion.
- [521] arXiv:2405.07730 [pdf, ps, other]
-
Title: Does Dependency Locality Predict Non-canonical Word Order in Hindi?Comments: Accepted at CogSci-2024 with full paper publicationSubjects: Computation and Language (cs.CL)
Previous work has shown that isolated non-canonical sentences with Object-before-Subject (OSV) order are initially harder to process than their canonical counterparts with Subject-before-Object (SOV) order. Although this difficulty diminishes with appropriate discourse context, the underlying cognitive factors responsible for alleviating processing challenges in OSV sentences remain a question. In this work, we test the hypothesis that dependency length minimization is a significant predictor of non-canonical (OSV) syntactic choices, especially when controlling for information status such as givenness and surprisal measures. We extract sentences from the Hindi-Urdu Treebank corpus (HUTB) that contain clearly-defined subjects and objects, systematically permute the preverbal constituents of those sentences, and deploy a classifier to distinguish between original corpus sentences and artificially generated alternatives. The classifier leverages various discourse-based and cognitive features, including dependency length, surprisal, and information status, to inform its predictions. Our results suggest that, although there exists a preference for minimizing dependency length in non-canonical corpus sentences amidst the generated variants, this factor does not significantly contribute in identifying corpus sentences above and beyond surprisal and givenness measures. Notably, discourse predictability emerges as the primary determinant of constituent-order preferences. These findings are further supported by human evaluations involving 44 native Hindi speakers. Overall, this work sheds light on the role of expectation adaptation in word-ordering decisions. We conclude by situating our results within the theories of discourse production and information locality.
- [522] arXiv:2405.07733 [pdf, ps, other]
-
Title: TOPress3D: 3D topology optimization with design-dependent pressure loads in MATLABComments: 10 FiguresSubjects: Computational Engineering, Finance, and Science (cs.CE)
This paper introduces ``TOPress3D," a 3D topology optimization MATLAB code for structures subjected to design-dependent pressure loads. With a primary focus on pedagogical objectives, the code provides an easy learning experience, making it a valuable tool and practical gateway for newcomers, students, and researchers towards this topic. TOPress3D uses Darcy's law with a drainage term to link the given pressure load to design variables, which is converted to consistent nodal loads. Compliance minimization subjected to volume constraint optimization problems with pressure loads are solved. Load sensitivities arising due to design-dependent nature of the loads are evaluated using the adjoint-variable approach. The method of moving asymptotes is used to update the design variables. TOPress3D is constituted by six main parts. Each is described in detail. The code is also tailored to solve different problems. The robustness and success of the code are demonstrated while designing a few pressure load-bearing structures. The code is provided in Appendix B and is available with extensions in the supplementary material and publicly at \url{this https URL}.
- [523] arXiv:2405.07736 [pdf, ps, other]
-
Title: Learning to Plan Maneuverable and Agile Flight Trajectory with Optimization Embedded NetworksComments: this https URLSubjects: Robotics (cs.RO)
In recent times, an increasing number of researchers have been devoted to utilizing deep neural networks for end-to-end flight navigation. This approach has gained traction due to its ability to bridge the gap between perception and planning that exists in traditional methods, thereby eliminating delays between modules. However, the practice of replacing original modules with neural networks in a black-box manner diminishes the overall system's robustness and stability. It lacks principled explanations and often fails to consistently generate high-quality motion trajectories. Furthermore, such methods often struggle to rigorously account for the robot's kinematic constraints, resulting in the generation of trajectories that cannot be executed satisfactorily. In this work, we combine the advantages of traditional methods and neural networks by proposing an optimization-embedded neural network. This network can learn high-quality trajectories directly from visual inputs without the need of mapping, while ensuring dynamic feasibility. Here, the deep neural network is employed to directly extract environment safety regions from depth images. Subsequently, we employ a model-based approach to represent these regions as safety constraints in trajectory optimization. Leveraging the availability of highly efficient optimization algorithms, our method robustly converges to feasible and optimal solutions that satisfy various user-defined constraints. Moreover, we differentiate the optimization process, allowing it to be trained as a layer within the neural network. This approach facilitates the direct interaction between perception and planning, enabling the network to focus more on the spatial regions where optimal solutions exist. As a result, it further enhances the quality and stability of the generated trajectories.
- [524] arXiv:2405.07740 [pdf, ps, other]
-
Title: The $\sigma$ hulls of matrix-product codes and related entanglement-assisted quantum error-correcting codesSubjects: Information Theory (cs.IT)
Let $\mathrm{SLAut}(\mathbb{F}_{q}^{n})$ denote the group of all semilinear isometries on $\mathbb{F}_{q}^{n}$, where $q=p^{e}$ is a prime power. Matrix-product (MP) codes are a class of long classical codes generated by combining several commensurate classical codes with a defining matrix. We give an explicit formula for calculating the dimension of the $\sigma$ hull of a MP code. As a result, we give necessary and sufficient conditions for the MP codes to be $\sigma$ dual-containing and $\sigma$ self-orthogonal. We prove that $\mathrm{dim}_{\mathbb{F}_{q}}(\mathrm{Hull}_{\sigma}(\mathcal{C}))=\mathrm{dim}_{\mathbb{F}_{q}}(\mathrm{Hull}_{\sigma}(\mathcal{C}^{\bot_{\sigma}}))$. We prove that for any integer $h$ with $\mathrm{max}\{0,k_{1}-k_{2}\}\leq h\leq \mathrm{dim}_{\mathbb{F}_{q}}(\mathcal{C}_{1}\cap\mathcal{C}_{2}^{\bot_{\sigma}})$, there exists a linear code $\mathcal{C}_{2,h}$ monomially equivalent to $\mathcal{C}_{2}$ such that $\mathrm{dim}_{\mathbb{F}_{q}}(\mathcal{C}_{1}\cap\mathcal{C}_{2,h}^{\bot_{\sigma}})=h$, where $\mathcal{C}_{i}$ is an $[n,k_{i}]_{q}$ linear code for $i=1,2$. We show that given an $[n,k,d]_{q}$ linear code $\mathcal{C}$, there exists a monomially equivalent $[n,k,d]_{q}$ linear code $\mathcal{C}_{h}$, whose $\sigma$ dual code has minimum distance $d'$, such that there exist an $[[n,k-h,d;n-k-h]]_{q}$ EAQECC and an $[[n,n-k-h,d';k-h]]_{q}$ EAQECC for every integer $h$ with $0\leq h\leq \mathrm{dim}_{\mathbb{F}_{q}}(\mathrm{Hull}_{\sigma}(\mathcal{C}))$. Based on this result, we present a general construction method for deriving EAQECCs with flexible parameters from MP codes related to $\sigma$ hulls.
- [525] arXiv:2405.07744 [pdf, ps, other]
-
Title: MoCo: Fuzzing Deep Learning Libraries via Assembling CodeSubjects: Software Engineering (cs.SE)
The rapidly developing deep learning (DL) techniques have been applied in software systems with various application scenarios. However, they could also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in the diversity of test inputs, the construction of test oracles, and the precision of detection. In this paper, we propose MoCo, a novel fuzzing testing method for DL libraries via assembling code. MoCo first disassembles the seed code file to obtain the template and code blocks, and then employs code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate more new code blocks adapted to the template. By inserting context-appropriate code blocks into the template step by step, MoCo can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree and the applied mutation operators, we construct the test oracle based on the execution state consistency. Since the granularity of code assembly and mutation is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of MoCo using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiment, MoCo detects 64 new bugs of four types in three DL libraries, where 51 bugs have been confirmed, and 13 bugs have been fixed by developers.
- [526] arXiv:2405.07745 [pdf, ps, other]
-
Title: LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource LanguageSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Despite advancements in English-dominant generative large language models, further development is needed for low-resource languages to enhance global accessibility. The primary methods for representing these languages are monolingual and multilingual pretraining. Monolingual pretraining is expensive due to hardware requirements, and multilingual models often have uneven performance across languages. This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages. We assess various strategies, including continual training, instruction fine-tuning, task-specific fine-tuning, and vocabulary extension. The results show that continual training improves language comprehension, as reflected in perplexity scores, and task-specific tuning generally enhances performance of downstream tasks. However, extending the vocabulary shows no substantial benefits. Additionally, while larger models improve task performance with few-shot tuning, multilingual models perform worse than their monolingual counterparts when adapted.
- [527] arXiv:2405.07748 [pdf, ps, other]
-
Title: Neural Network Compression for Reinforcement Learning TasksComments: 14 pages, 6 figuresSubjects: Machine Learning (cs.LG)
In real applications of Reinforcement Learning (RL), such as robotics, low latency and energy efficient inference is very desired. The use of sparsity and pruning for optimizing Neural Network inference, and particularly to improve energy and latency efficiency, is a standard technique. In this work, we perform a systematic investigation of applying these optimization techniques for different RL algorithms in different RL environments, yielding up to a 400-fold reduction in the size of neural networks.
- [528] arXiv:2405.07749 [pdf, ps, other]
-
Title: DeepHYDRA: Resource-Efficient Time-Series Anomaly Detection in Dynamically-Configured SystemsJournal-ref: Proceedings of the 38th ACM International Conference on Supercomputing (ICS '24), June 4--7, 2024, Kyoto, JapanSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Anomaly detection in distributed systems such as High-Performance Computing (HPC) clusters is vital for early fault detection, performance optimisation, security monitoring, reliability in general but also operational insights. Deep Neural Networks have seen successful use in detecting long-term anomalies in multidimensional data, originating for instance from industrial or medical systems, or weather prediction. A downside of such methods is that they require a static input size, or lose data through cropping, sampling, or other dimensionality reduction methods, making deployment on systems with variability on monitored data channels, such as computing clusters difficult. To address these problems, we present DeepHYDRA (Deep Hybrid DBSCAN/Reduction-Based Anomaly Detection) which combines DBSCAN and learning-based anomaly detection. DBSCAN clustering is used to find point anomalies in time-series data, mitigating the risk of missing outliers through loss of information when reducing input data to a fixed number of channels. A deep learning-based time-series anomaly detection method is then applied to the reduced data in order to identify long-term outliers. This hybrid approach reduces the chances of missing anomalies that might be made indistinguishable from normal data by the reduction process, and likewise enables the algorithm to be scalable and tolerate partial system failures while retaining its detection capabilities. Using a subset of the well-known SMD dataset family, a modified variant of the Eclipse dataset, as well as an in-house dataset with a large variability in active data channels, made publicly available with this work, we furthermore analyse computational intensity, memory footprint, and activation counts. DeepHYDRA is shown to reliably detect different types of anomalies in both large and complex datasets.
- [529] arXiv:2405.07751 [pdf, ps, html, other]
-
Title: Integrating supervised and unsupervised learning approaches to unveil critical process inputsParis Papavasileiou, Dimitrios G. Giovanis, Gabriele Pozzetti, Martin Kathrein, Christoph Czettl, Ioannis G. Kevrekidis, Andreas G. Boudouvis, Stéphane P. A. Bordas, Eleni D. KoronakiSubjects: Machine Learning (cs.LG)
This study introduces a machine learning framework tailored to large-scale industrial processes characterized by a plethora of numerical and categorical inputs. The framework aims to (i) discern critical parameters influencing the output and (ii) generate accurate out-of-sample qualitative and quantitative predictions of production outcomes. Specifically, we address the pivotal question of the significance of each input in shaping the process outcome, using an industrial Chemical Vapor Deposition (CVD) process as an example. The initial objective involves merging subject matter expertise and clustering techniques exclusively on the process output, here, coating thickness measurements at various positions in the reactor. This approach identifies groups of production runs that share similar qualitative characteristics, such as film mean thickness and standard deviation. In particular, the differences of the outcomes represented by the different clusters can be attributed to differences in specific inputs, indicating that these inputs are critical for the production outcome. Leveraging this insight, we subsequently implement supervised classification and regression methods using the identified critical process inputs. The proposed methodology proves to be valuable in scenarios with a multitude of inputs and insufficient data for the direct application of deep learning techniques, providing meaningful insights into the underlying processes.
- [530] arXiv:2405.07759 [pdf, ps, html, other]
-
Title: MADRL-Based Rate Adaptation for 360$\degree$ Video Streaming with Multi-Viewpoint PredictionComments: Accepted by IEEE Internet of Things JournalSubjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Image and Video Processing (eess.IV)
Over the last few years, 360$\degree$ video traffic on the network has grown significantly. A key challenge of 360$\degree$ video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpoint prediction is severely limited by the inherent uncertainty in head movement, which can not cope with the sudden movement of users very well. This paper first presents a multimodal spatial-temporal attention transformer to generate multiple viewpoint trajectories with their probabilities given a historical trajectory. The proposed method models viewpoint prediction as a classification problem and uses attention mechanisms to capture the spatial and temporal characteristics of input video frames and viewpoint trajectories for multi-viewpoint prediction. After that, a multi-agent deep reinforcement learning (MADRL)-based ABR algorithm utilizing multi-viewpoint prediction for 360$\degree$ video streaming is proposed for maximizing different QoE objectives under various network conditions. We formulate the ABR problem as a decentralized partially observable Markov decision process (Dec-POMDP) problem and present a MAPPO algorithm based on centralized training and decentralized execution (CTDE) framework to solve the problem. The experimental results show that our proposed method improves the defined QoE metric by up to 85.5\% compared to existing ABR methods.
- [531] arXiv:2405.07760 [pdf, ps, other]
-
Title: CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian OptimizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Bayesian optimization (BO) is a popular approach for optimizing expensive-to-evaluate black-box objective functions. An important challenge in BO is its application to high-dimensional search spaces due in large part to the curse of dimensionality. One way to overcome this challenge is to focus on local BO methods that aim to efficiently learn gradients, which have shown strong empirical performance on a variety of high-dimensional problems including policy search in reinforcement learning (RL). However, current local BO methods assume access to only a single high-fidelity information source whereas, in many engineering and control problems, one has access to multiple cheaper approximations of the objective. We propose a novel algorithm, Cost-Aware Gradient Entropy Search (CAGES), for local BO of multi-fidelity black-box functions. CAGES makes no assumption about the relationship between different information sources, making it more flexible than other multi-fidelity methods. It also employs a new type of information-theoretic acquisition function, which enables systematic identification of samples that maximize the information gain about the unknown gradient per cost of the evaluation. We demonstrate CAGES can achieve significant performance improvements compared to other state-of-the-art methods on a variety of synthetic and benchmark RL problems.
- [532] arXiv:2405.07761 [pdf, ps, other]
-
Title: LLM4ED: Large Language Models for Automatic Equation DiscoverySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Symbolic Computation (cs.SC); Mathematical Physics (math-ph); Applications (stat.AP)
Equation discovery is aimed at directly extracting physical laws from data and has emerged as a pivotal research domain. Previous methods based on symbolic mathematics have achieved substantial advancements, but often require the design of implementation of complex algorithms. In this paper, we introduce a new framework that utilizes natural language-based prompts to guide large language models (LLMs) in automatically mining governing equations from data. Specifically, we first utilize the generation capability of LLMs to generate diverse equations in string form, and then evaluate the generated equations based on observations. In the optimization phase, we propose two alternately iterated strategies to optimize generated equations collaboratively. The first strategy is to take LLMs as a black-box optimizer and achieve equation self-improvement based on historical samples and their performance. The second strategy is to instruct LLMs to perform evolutionary operators for global search. Experiments are extensively conducted on both partial differential equations and ordinary differential equations. Results demonstrate that our framework can discover effective equations to reveal the underlying physical laws under various nonlinear dynamic systems. Further comparisons are made with state-of-the-art models, demonstrating good stability and usability. Our framework substantially lowers the barriers to learning and applying equation discovery techniques, demonstrating the application potential of LLMs in the field of knowledge discovery.
- [533] arXiv:2405.07764 [pdf, ps, other]
-
Title: LGDE: Local Graph-based Dictionary ExpansionSubjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Expanding a dictionary of pre-selected keywords is crucial for tasks in information retrieval, such as database query and online data collection. Here we propose Local Graph-based Dictionary Expansion (LGDE), a method that uses tools from manifold learning and network science for the data-driven discovery of keywords starting from a seed dictionary. At the heart of LGDE lies the creation of a word similarity graph derived from word embeddings and the application of local community detection based on graph diffusion to discover semantic neighbourhoods of pre-defined seed keywords. The diffusion in the local graph manifold allows the exploration of the complex nonlinear geometry of word embeddings and can capture word similarities based on paths of semantic association. We validate our method on a corpus of hate speech-related posts from Reddit and Gab and show that LGDE enriches the list of keywords and achieves significantly better performance than threshold methods based on direct word similarities. We further demonstrate the potential of our method through a real-world use case from communication science, where LGDE is evaluated quantitatively on data collected and analysed by domain experts by expanding a conspiracy-related dictionary.
- [534] arXiv:2405.07765 [pdf, ps, other]
-
Title: TANQ: An open domain dataset of table answered questionsComments: 10 pagesSubjects: Computation and Language (cs.CL)
Language models, potentially augmented with tool usage such as retrieval are becoming the go-to means of answering questions. Understanding and answering questions in real-world settings often requires retrieving information from different sources, processing and aggregating data to extract insights, and presenting complex findings in form of structured artifacts such as novel tables, charts, or infographics. In this paper, we introduce TANQ, the first open domain question answering dataset where the answers require building tables from information across multiple sources. We release the full source attribution for every cell in the resulting table and benchmark state-of-the-art language models in open, oracle, and closed book setups. Our best-performing baseline, GPT4 reaches an overall F1 score of 29.1, lagging behind human performance by 19.7 points. We analyse baselines' performance across different dataset attributes such as different skills required for this task, including multi-hop reasoning, math operations, and unit conversions. We further discuss common failures in model-generated answers, suggesting that TANQ is a complex task with many challenges ahead.
- [535] arXiv:2405.07766 [pdf, ps, other]
-
Title: Challenges and Opportunities of NLP for HR Applications: A Discussion PaperComments: 10 pages, 2 figures, 1 tableSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Over the course of the recent decade, tremendous progress has been made in the areas of machine learning and natural language processing, which opened up vast areas of potential application use cases, including hiring and human resource management. We review the use cases for text analytics in the realm of human resources/personnel management, including actually realized as well as potential but not yet implemented ones, and we analyze the opportunities and risks of these.
- [536] arXiv:2405.07767 [pdf, ps, other]
-
Title: Synthetic Test Collections for Retrieval EvaluationComments: SIGIR 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recently gained significant attention in various applications. In IR, while previous work exploited the capabilities of LLMs to generate synthetic queries or documents to augment training data and improve the performance of ranking models, using LLMs for constructing synthetic test collections is relatively unexplored. Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems. In this paper, we comprehensively investigate whether it is possible to use LLMs to construct fully synthetic test collections by generating not only synthetic judgments but also synthetic queries. In particular, we analyse whether it is possible to construct reliable synthetic test collections and the potential risks of bias such test collections may exhibit towards LLM-based models. Our experiments indicate that using LLMs it is possible to construct synthetic test collections that can reliably be used for retrieval evaluation.
- [537] arXiv:2405.07769 [pdf, ps, other]
-
Title: $\alpha$VIL: Learning to Leverage Auxiliary Tasks for Multitask LearningComments: 11 pages, 1 algorithm, 4 figures, 2 tablesSubjects: Machine Learning (cs.LG)
Multitask Learning is a Machine Learning paradigm that aims to train a range of (usually related) tasks with the help of a shared model. While the goal is often to improve the joint performance of all training tasks, another approach is to focus on the performance of a specific target task, while treating the remaining ones as auxiliary data from which to possibly leverage positive transfer towards the target during training. In such settings, it becomes important to estimate the positive or negative influence auxiliary tasks will have on the target. While many ways have been proposed to estimate task weights before or during training they typically rely on heuristics or extensive search of the weighting space. We propose a novel method called $\alpha$-Variable Importance Learning ($\alpha$VIL) that is able to adjust task weights dynamically during model training, by making direct use of task-specific updates of the underlying model's parameters between training epochs. Experiments indicate that $\alpha$VIL is able to outperform other Multitask Learning approaches in a variety of settings. To our knowledge, this is the first attempt at making direct use of model updates for task weight estimation.
- [538] arXiv:2405.07773 [pdf, ps, other]
-
Title: Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AIComments: 9 pages, 1 figure, 1 tableSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
"Human-aware" has become a popular keyword used to describe a particular class of AI systems that are designed to work and interact with humans. While there exists a surprising level of consistency among the works that use the label human-aware, the term itself mostly remains poorly understood. In this work, we retroactively try to provide an account of what constitutes a human-aware AI system. We see that human-aware AI is a design-oriented paradigm, one that focuses on the need for modeling the humans it may interact with. Additionally, we see that this paradigm offers us intuitive dimensions to understand and categorize the kinds of interactions these systems might have with humans. We show the pedagogical value of these dimensions by using them as a tool to understand and review the current landscape of work related to human-AI systems that purport some form of human modeling. To fit the scope of a workshop paper, we specifically narrowed our review to papers that deal with sequential decision-making and were published in a major AI conference in the last three years. Our analysis helps identify the space of potential research problems that are currently being overlooked. We perform additional analysis on the degree to which these works make explicit reference to results from social science and whether they actually perform user-studies to validate their systems. We also provide an accounting of the various AI methods used by these works.
- [539] arXiv:2405.07774 [pdf, ps, other]
-
Title: Open Source in Lab ManagementComments: 2024 ISMRM & ISMRT Annual MeetingSubjects: Computers and Society (cs.CY)
This document explores the advantages of integrating open source software and practices in managing a scientific lab, emphasizing reproducibility and the avoidance of pitfalls. It details practical applications from website management using GitHub Pages to organizing datasets in compliance with BIDS standards, highlights the importance of continuous testing for data integrity, IT management through Ansible for efficient system configuration, open source software development. The broader goal is to promote transparent, reproducible science by adopting open source tools. This approach not only saves time but exposes students to best practices, enhancing the transparency and reproducibility of scientific research.
- [540] arXiv:2405.07776 [pdf, ps, other]
-
Title: SAR Image Synthesis with Diffusion ModelsComments: Published at IEEE Radar Conference 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
In recent years, diffusion models (DMs) have become a popular method for generating synthetic data. By achieving samples of higher quality, they quickly became superior to generative adversarial networks (GANs) and the current state-of-the-art method in generative modeling. However, their potential has not yet been exploited in radar, where the lack of available training data is a long-standing problem. In this work, a specific type of DMs, namely denoising diffusion probabilistic model (DDPM) is adapted to the SAR domain. We investigate the network choice and specific diffusion parameters for conditional and unconditional SAR image generation. In our experiments, we show that DDPM qualitatively and quantitatively outperforms state-of-the-art GAN-based methods for SAR image generation. Finally, we show that DDPM profits from pretraining on largescale clutter data, generating SAR images of even higher quality.
- [541] arXiv:2405.07777 [pdf, ps, other]
-
Title: GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-linear computational efficiency and superior performance, prompting our investigation into its potential for SR problem. To this end, we propose the Gradient-guided Mamba for Spectral Reconstruction from RGB Images, dubbed GMSR-Net. GMSR-Net is a lightweight model characterized by a global receptive field and linear computational complexity. Its core comprises multiple stacked Gradient Mamba (GM) blocks, each featuring a tri-branch structure. In addition to benefiting from efficient global feature representation by Mamba block, we further innovatively introduce spatial gradient attention and spectral gradient attention to guide the reconstruction of spatial and spectral cues. GMSR-Net demonstrates a significant accuracy-efficiency trade-off, achieving state-of-the-art performance while markedly reducing the number of parameters and computational burdens. Compared to existing approaches, GMSR-Net slashes parameters and FLOPS by substantial margins of 10 times and 20 times, respectively. Code is available at this https URL.
- [542] arXiv:2405.07778 [pdf, ps, other]
-
Title: A Comprehensive Analysis of Static Word Embeddings for TurkishJournal-ref: Expert Systems with Applications Volume 252, Part A, 15 October 2024, 124123Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Word embeddings are fixed-length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non-contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and non-contextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non-contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available.
- [543] arXiv:2405.07780 [pdf, ps, other]
-
Title: Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail RecognitionZhiyong Yang, Qianqian Xu, Zitai Wang, Sicong Li, Boyu Han, Shilong Bao, Xiaochun Cao, Qingming HuangSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
This paper explores test-agnostic long-tail recognition, a challenging long-tail task where the test label distributions are unknown and arbitrarily imbalanced. We argue that the variation in these distributions can be broken down hierarchically into global and local levels. The global ones reflect a broad range of diversity, while the local ones typically arise from milder changes, often focused on a particular neighbor. Traditional methods predominantly use a Mixture-of-Expert (MoE) approach, targeting a few fixed test label distributions that exhibit substantial global variations. However, the local variations are left unconsidered. To address this issue, we propose a new MoE strategy, $\mathsf{DirMixE}$, which assigns experts to different Dirichlet meta-distributions of the label distribution, each targeting a specific aspect of local variations. Additionally, the diversity among these Dirichlet meta-distributions inherently captures global variations. This dual-level approach also leads to a more stable objective function, allowing us to sample different test distributions better to quantify the mean and variance of performance outcomes. Theoretically, we show that our proposed objective benefits from enhanced generalization by virtue of the variance-based regularization. Comprehensive experiments across multiple benchmarks confirm the effectiveness of $\mathsf{DirMixE}$. The code is available at \url{this https URL}.
- [544] arXiv:2405.07781 [pdf, ps, other]
-
Title: Requirements Engineering for Research Software: A VisionComments: Accepted at the 32nd IEEE International Requirements Engineering 2024 (RE) conferenceSubjects: Software Engineering (cs.SE)
Modern science is relying on software more than ever. The behavior and outcomes of this software shape the scientific and public discourse on important topics like climate change, economic growth, or the spread of infections. Most researchers creating software for scientific purposes are not trained in Software Engineering. As a consequence, research software is often developed ad hoc without following stringent processes. With this paper, we want to characterize research software as a new application domain that needs attention from the Requirements Engineering community. We conducted an exploratory study based on 8 interviews with 12 researchers who develop software. We describe how researchers elicit, document, and analyze requirements for research software and what processes they follow. From this, we derive specific challenges and describe a vision of Requirements Engineering for research software.
- [545] arXiv:2405.07782 [pdf, ps, other]
-
Title: Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank?Comments: Published at ECIR 2024 as a long paper. 13 pages excl. reference, 20 pages incl. referenceJournal-ref: Advances in Information Retrieval - 46th European Conference on Information Retrieval, {ECIR} 2024, Glasgow, UK, March 24-28, 2024, Proceedings, Part {IV}Subjects: Information Retrieval (cs.IR)
Neural ranking models have become increasingly popular for real-world search and recommendation systems in recent years. Unlike their tree-based counterparts, neural models are much less interpretable. That is, it is very difficult to understand their inner workings and answer questions like how do they make their ranking decisions? or what document features do they find important? This is particularly disadvantageous since interpretability is highly important for real-world systems. In this work, we explore feature selection for neural learning-to-rank (LTR). In particular, we investigate six widely-used methods from the field of interpretable machine learning (ML) and introduce our own modification, to select the input features that are most important to the ranking behavior. To understand whether these methods are useful for practitioners, we further study whether they contribute to efficiency enhancement. Our experimental results reveal a large feature redundancy in several LTR benchmarks: the local selection method TabNet can achieve optimal ranking performance with less than 10 features; the global methods, particularly our G-L2X, require slightly more selected features, but exhibit higher potential in improving efficiency. We hope that our analysis of these feature selection methods will bring the fields of interpretable ML and LTR closer together.
- [546] arXiv:2405.07784 [pdf, ps, other]
-
Title: Generating Human Motion in 3D Scenes from Text DescriptionsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Generating human motions from textual descriptions has gained growing research interest due to its wide range of applications. However, only a few works consider human-scene interactions together with text conditions, which is crucial for visual and physical realism. This paper focuses on the task of generating human motions in 3D indoor scenes given text descriptions of the human-scene interactions. This task presents challenges due to the multi-modality nature of text, scene, and motion, as well as the need for spatial reasoning. To address these challenges, we propose a new approach that decomposes the complex problem into two more manageable sub-problems: (1) language grounding of the target object and (2) object-centric motion generation. For language grounding of the target object, we leverage the power of large language models. For motion generation, we design an object-centric scene representation for the generative model to focus on the target object, thereby reducing the scene complexity and facilitating the modeling of the relationship between human motions and the object. Experiments demonstrate the better motion quality of our approach compared to baselines and validate our design choices.
- [547] arXiv:2405.07785 [pdf, ps, other]
-
Title: Capacity of Frequency-based Channels: Encoding Information in Molecular ConcentrationsSubjects: Information Theory (cs.IT)
We consider a molecular channel, in which messages are encoded to the frequency of objects (or concentration of molecules) in a pool, and whose output during reading time is a noisy version of the input frequencies, as obtained by sampling with replacement from the pool. We tightly characterize the capacity of this channel using upper and lower bounds, when the number of objects in the pool of objects is constrained. We apply this result to the DNA storage channel in the short-molecule regime, and show that even though the capacity of this channel is technically zero, it can still achieve a large information density.
- [548] arXiv:2405.07787 [pdf, ps, other]
-
Title: A Note on Equivalent Conditions for MajorizationComments: Published in: AIMS Mathematics - Special Issue on Mathematical Foundations of Information TheoryJournal-ref: AIMS Mathematics 2024, 9(4), 8641-8660Subjects: Information Theory (cs.IT)
In this paper, we introduce novel characterizations of the classical concept of majorization in terms of upper triangular (resp., lower triangular) row-stochastic matrices, and in terms of sequences of linear transforms on vectors. We used our new characterizations of majorization to derive an improved entropy inequality.
- [549] arXiv:2405.07788 [pdf, ps, other]
-
Title: DEPTH: Discourse Education through Pre-Training HierarchicallyComments: 28 pages, 10 figures, 8 tablesSubjects: Computation and Language (cs.CL)
Language Models (LMs) often struggle with linguistic understanding at the discourse level, even though discourse patterns such as coherence, cohesion, and narrative flow are prevalent in their pre-training data. Current methods address these challenges only after the pre-training phase, relying on expensive human annotated data to align the model. To improve the discourse capabilities of LMs already at the pre-training stage, we introduce DEPTH, an encoder-decoder model that learns to represent sentences using a discourse-oriented pre-training objective. DEPTH combines hierarchical sentence representations with two objectives: (1) Sentence Un-Shuffling, and (2) Span-Corruption. This approach trains the model to represent both sub-word-level and sentence-level dependencies over a massive amount of unstructured text. When trained either from scratch or continuing from a pre-trained T5 checkpoint, DEPTH learns semantic and discourse-level representations faster than T5, outperforming it in span-corruption loss despite the additional sentence-un-shuffling objective. Evaluations on the GLUE, DiscoEval, and NI benchmarks demonstrate DEPTH's ability to quickly learn diverse downstream tasks, which require syntactic, semantic, and discourse capabilities. Overall, our approach extends the discourse capabilities of T5, while minimally impacting other natural language understanding (NLU) capabilities in the resulting LM.
- [550] arXiv:2405.07791 [pdf, ps, other]
-
Title: Decentralized Kernel Ridge Regression Based on Data-dependent Random FeatureSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
Random feature (RF) has been widely used for node consistency in decentralized kernel ridge regression (KRR). Currently, the consistency is guaranteed by imposing constraints on coefficients of features, necessitating that the random features on different nodes are identical. However, in many applications, data on different nodes varies significantly on the number or distribution, which calls for adaptive and data-dependent methods that generate different RFs. To tackle the essential difficulty, we propose a new decentralized KRR algorithm that pursues consensus on decision functions, which allows great flexibility and well adapts data on nodes. The convergence is rigorously given and the effectiveness is numerically verified: by capturing the characteristics of the data on each node, while maintaining the same communication costs as other methods, we achieved an average regression accuracy improvement of 25.5\% across six real-world data sets.
- [551] arXiv:2405.07792 [pdf, ps, other]
-
Title: Optimal Matrix Sketching over Sliding WindowsSubjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound and provides a covariance error guarantee of $\varepsilon = \lVert \boldsymbol{A}^\top \boldsymbol{A} - \boldsymbol{B}^\top \boldsymbol{B} \rVert_2/\lVert \boldsymbol{A} \rVert_F^2$. The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix $\boldsymbol{A}_W$, formed by input vectors over the most recent $N$ time units. However, despite recent efforts, whether achieving the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound on sliding windows is possible has remained an open question.
In this paper, we introduce the DS-FD algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of \dsfd across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically. - [552] arXiv:2405.07798 [pdf, ps, other]
-
Title: FreeVA: Offline MLLM as Training-Free Video AssistantComments: Preprint. Work in progressSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This paper undertakes an empirical study to revisit the latest advancements in Multimodal Large Language Models (MLLMs): Video Assistant. This study, namely FreeVA, aims to extend existing image-based MLLM to the video domain in a training-free manner. The study provides an essential, yet must-know baseline, and reveals several surprising findings: 1) FreeVA, leveraging only offline image-based MLLM without additional training, excels in zero-shot video question-answering (e.g., MSVD-QA, ActivityNet-QA, and MSRVTT-QA), even surpassing state-of-the-art methods that involve video instruction tuning. 2) While mainstream video-based MLLMs typically initialize with an image-based MLLM (e.g., LLaVA) and then fine-tune using video instruction tuning, the study indicates that utilizing the widely adopted VideoInstruct-100K for video instruction tuning doesn't actually lead to better performance compared to not training at all. 3) The commonly used evaluation metrics in existing works are significantly influenced by changes in the GPT API version over time. If ignored, this could affect the fairness and uniformity of comparisons between different methods and impact the analysis and judgment of researchers in the field. The advancement of MLLMs is currently thriving, drawing numerous researchers into the field. We aim for this work to serve as a plug-and-play, simple yet effective baseline, encouraging the direct evaluation of existing MLLMs in video domain while also standardizing the field of video conversational models to a certain extent. Also, we encourage researchers to reconsider: Have current video MLLM methods truly acquired knowledge beyond image MLLM? Code is available at this https URL
- [553] arXiv:2405.07799 [pdf, ps, other]
-
Title: Collective Decision-Making on Task Allocation FeasibilityComments: 3 Pages, 3 Figures, Accepted to ICRA 2024 Workshop "Breaking Swarm Stereotypes"Subjects: Robotics (cs.RO)
Robot swarms offer the potential to bring several advantages to the real-world applications but deploying them presents challenges in ensuring feasibility across diverse environments. Assessing the feasibility of new tasks for swarms is crucial to ensure the effective utilisation of resources, as well as to provide awareness of the suitability of a swarm solution for a particular task. In this paper, we introduce the concept of distributed feasibility, where the swarm collectively assesses the feasibility of task allocation based on local observations and interactions. We apply Direct Modulation of Majority-based Decisions as our collective decision-making strategy and show that, in a homogeneous setting, the swarm is able to collectively decide whether a given setup has a high or low feasibility as long as the robot-to-task ratio is not near one.
- [554] arXiv:2405.07800 [pdf, ps, other]
-
Title: Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based MethodSubjects: Machine Learning (cs.LG)
Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows that the pursuit of better classification can guide the data imputation process. While some works consider using label information to assist in this task, their simplistic utilization of labels lacks flexibility and may rely on strict assumptions. In this paper, we propose a new framework that effectively leverages supervision information to complete missing data in a manner conducive to classification. Specifically, this framework operates in two stages. Firstly, it leverages labels to supervise the optimization of similarity relationships among data, represented by the kernel matrix, with the goal of enhancing classification accuracy. To mitigate overfitting that may occur during this process, a perturbation variable is introduced to improve the robustness of the framework. Secondly, the learned kernel matrix serves as additional supervision information to guide data imputation through regression, utilizing the block coordinate descent method. The superiority of the proposed method is evaluated on four real-world data sets by comparing it with state-of-the-art imputation methods. Remarkably, our algorithm significantly outperforms other methods when the data is missing more than 60\% of the features
- [555] arXiv:2405.07801 [pdf, ps, other]
-
Title: Deep Learning-Based Object Pose Estimation: A Comprehensive SurveyJian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal MianComments: 27 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependency on labeled training data, model compactness, robustness under challenging conditions, and their ability to generalize to novel unseen objects. A recent survey discussing the progress made on different aspects of this area, outstanding challenges, and promising future directions, is missing. To fill this gap, we discuss the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, i.e., instance-level, category-level, and unseen object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks, providing readers with a holistic understanding of this field. Additionally, it discusses training paradigms of different domains, inference modes, application areas, evaluation metrics, and benchmark datasets, as well as reports the performance of current state-of-the-art methods on these benchmarks, thereby facilitating readers in selecting the most suitable method for their application. Finally, the survey identifies key challenges, reviews prevailing trends along with their pros and cons, and identifies promising directions for future research. We also keep tracing the latest works at this https URL.
- [556] arXiv:2405.07803 [pdf, ps, other]
-
Title: Decoding Geometric Properties in Non-Random Data from First Information-Theoretic PrinciplesComments: arXiv admin note: substantial text overlap with arXiv:2303.16045. substantial text overlap with arXiv:2303.16045Subjects: Information Theory (cs.IT); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Information Retrieval (cs.IR); Statistics Theory (math.ST)
Based on the principles of information theory, measure theory, and theoretical computer science, we introduce a univariate signal deconvolution method with a wide range of applications to coding theory, particularly in zero-knowledge one-way communication channels, such as in deciphering messages from unknown generating sources about which no prior knowledge is available and to which no return message can be sent. Our multidimensional space reconstruction method from an arbitrary received signal is proven to be agnostic vis-a-vis the encoding-decoding scheme, computation model, programming language, formal theory, the computable (or semi-computable) method of approximation to algorithmic complexity, and any arbitrarily chosen (computable) probability measure of the events. The method derives from the principles of an approach to Artificial General Intelligence capable of building a general-purpose model of models independent of any arbitrarily assumed prior probability distribution. We argue that this optimal and universal method of decoding non-random data has applications to signal processing, causal deconvolution, topological and geometric properties encoding, cryptography, and bio- and technosignature detection.
- [557] arXiv:2405.07806 [pdf, ps, other]
-
Title: A Decentralized and Self-Adaptive Approach for Monitoring Volatile Edge EnvironmentsComments: Submitted to ACM Transactions on Autonomous and Adaptive SystemsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Edge computing provides resources for IoT workloads at the network edge. Monitoring systems are vital for efficiently managing resources and application workloads by collecting, storing, and providing relevant information about the state of the resources. However, traditional monitoring systems have a centralized architecture for both data plane and control plane, which increases latency, creates a failure bottleneck, and faces challenges in providing quick and trustworthy data in volatile edge environments, especially where infrastructures are often built upon failure-prone, unsophisticated computing and network resources. Thus, we propose DEMon, a decentralized, self-adaptive monitoring system for edge. DEMon leverages the stochastic gossip communication protocol at its core. It develops efficient protocols for information dissemination, communication, and retrieval, avoiding a single point of failure and ensuring fast and trustworthy data access. Its decentralized control enables self-adaptive management of monitoring parameters, addressing the trade-offs between the quality of service of monitoring and resource consumption. We implement the proposed system as a lightweight and portable container-based system and evaluate it through experiments. We also present a use case demonstrating its feasibility. The results show that DEMon efficiently disseminates and retrieves the monitoring information, addressing the challenges of edge monitoring.
- [558] arXiv:2405.07807 [pdf, ps, other]
-
Title: Efficient Synthesis of Symbolic Distributed Protocols by SketchingSubjects: Logic in Computer Science (cs.LO)
We present a novel and efficient method for synthesis of parameterized distributed protocols by sketching. Our method is both syntax-guided and counterexample-guided, and utilizes a fast equivalence reduction technique that enables efficient completion of protocol sketches, often significantly reducing the search space of candidate completions by several orders of magnitude. To our knowledge, our tool, Scythe, is the first synthesis tool for the widely used specification language TLA+. We evaluate Scythe on a diverse benchmark of distributed protocols, demonstrating the ability to synthesize a large scale distributed Raft-based dynamic reconfiguration protocol beyond the scale of what existing synthesis techniques can handle.
- [559] arXiv:2405.07811 [pdf, ps, other]
-
Title: On the quadratic stability of asymmetric Hermite basis and application to plasma physicsSubjects: Numerical Analysis (math.NA)
We analyze why the discretization of linear transport with asymmetric Hermite basis functions can be instable in quadratic norm. The main reason is that the finite truncation of the infinite moment linear system looses the skew-symmetry property with respect to the Gram matrix. Then we propose an original closed formula for the scalar product of any pair of asymmetric basis functions. It makes possible the construction of two simple modifications of the linear systems which recover the skew-symmetry property. By construction the new methods are quadratically stable with respect to the natural $L^2$ norm. We explain how to generalize to other transport equations encountered in numerical plasma physics. Basic numerical tests illustrate the unconditional stability properties of our algorithms.
- [560] arXiv:2405.07812 [pdf, ps, other]
-
Title: Electromagnetic Nanonetworks Beyond 6G: From Wearable and Implantable Networks to On-chip and Quantum CommunicationComments: 22 pages, 8 figures, 3 tables; Accepted for publication at IEEE Journal on Selected Areas in Communications, 2024Subjects: Emerging Technologies (cs.ET); Optics (physics.optics)
Emerging from the symbiotic combination of nanotechnology and communications, the field of nanonetworking has come a long way since its inception more than fifteen years ago. Significant progress has been achieved in several key communication technologies as enablers of the paradigm, as well as in the multiple application areas that it opens. In this paper, the focus is placed on the electromagnetic nanonetworking paradigm, providing an overview of the advances made in wireless nanocommunication technology from microwave through terahertz to optical bands. The characteristics and potential of the compared technologies are then confronted with the requirements and challenges of the broad set of nanonetworking applications in the Internet of NanoThings (IoNT) and on-chip networks paradigms, including quantum computing applications for the first time. Finally, a selection of cross-cutting issues and possible directions for future work are given, aiming to guide researchers and practitioners towards the next generation of electromagnetic nanonetworks.
- [561] arXiv:2405.07813 [pdf, ps, other]
-
Title: Localizing Task Information for Improved Model Merging and CompressionComments: Accepted ICML 2024; The first two authors contributed equally to this work; Project website: this https URLSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have linked these drops to interference in the weight space and erasure of important task-specific features. Instead, in this work we show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights. We propose TALL-masks, a method to identify these task supports given a collection of task vectors and show that one can retrieve >99% of the single task accuracy by applying our masks to the multi-task vector, effectively compressing the individual checkpoints. We study the statistics of intersections among constructed masks and reveal the existence of selfish and catastrophic weights, i.e., parameters that are important exclusively to one task and irrelevant to all tasks but detrimental to multi-task fusion. For this reason, we propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches. Our experiments in vision and NLP benchmarks with up to 20 tasks, show that Consensus Merging consistently improves existing approaches. Furthermore, our proposed compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of original performance.
- [562] arXiv:2405.07814 [pdf, ps, other]
-
Title: NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Many aging individuals encounter challenges in effectively tracking their dietary intake, exacerbating their susceptibility to nutrition-related health complications. Self-reporting methods are often inaccurate and suffer from substantial bias; however, leveraging intelligent prediction methods can automate and enhance precision in this process. Recent work has explored using computer vision prediction systems to predict nutritional information from food images. Still, these methods are often tailored to specific situations, require other inputs in addition to a food image, or do not provide comprehensive nutritional information.
This paper aims to enhance the efficacy of dietary intake estimation by leveraging various neural network architectures to directly predict a meal's nutritional content from its image. Through comprehensive experimentation and evaluation, we present NutritionVerse-Direct, a model utilizing a vision transformer base architecture with three fully connected layers that lead to five regression heads predicting calories (kcal), mass (g), protein (g), fat (g), and carbohydrates (g) present in a meal. NutritionVerse-Direct yields a combined mean average error score on the NutritionVerse-Real dataset of 412.6, an improvement of 25.5% over the Inception-ResNet model, demonstrating its potential for improving dietary intake estimation accuracy. - [563] arXiv:2405.07816 [pdf, ps, other]
-
Title: Quick and Accurate Affordance LearningSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Infants learn actively in their environments, shaping their own learning curricula. They learn about their environments' affordances, that is, how local circumstances determine how their behavior can affect the environment. Here we model this type of behavior by means of a deep learning architecture. The architecture mediates between global cognitive map exploration and local affordance learning. Inference processes actively move the simulated agent towards regions where they expect affordance-related knowledge gain. We contrast three measures of uncertainty to guide this exploration: predicted uncertainty of a model, standard deviation between the means of several models (SD), and the Jensen-Shannon Divergence (JSD) between several models. We show that the first measure gets fooled by aleatoric uncertainty inherent in the environment, while the two other measures focus learning on epistemic uncertainty. JSD exhibits the most balanced exploration strategy. From a computational perspective, our model suggests three key ingredients for coordinating the active generation of learning curricula: (1) Navigation behavior needs to be coordinated with local motor behavior for enabling active affordance learning. (2) Affordances need to be encoded locally for acquiring generalized knowledge. (3) Effective active affordance learning mechanisms should use density comparison techniques for estimating expected knowledge gain. Future work may seek collaborations with developmental psychology to model active play in children in more realistic scenarios.
- [564] arXiv:2405.07817 [pdf, ps, other]
-
Title: The Power of Combined Modalities in Interactive Robot LearningSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
This study contributes to the evolving field of robot learning in interaction with humans, examining the impact of diverse input modalities on learning outcomes. It introduces the concept of "meta-modalities" which encapsulate additional forms of feedback beyond the traditional preference and scalar feedback mechanisms. Unlike prior research that focused on individual meta-modalities, this work evaluates their combined effect on learning outcomes. Through a study with human participants, we explore user preferences for these modalities and their impact on robot learning performance. Our findings reveal that while individual modalities are perceived differently, their combination significantly improves learning behavior and usability. This research not only provides valuable insights into the optimization of human-robot interactive task learning but also opens new avenues for enhancing the interactive freedom and scaffolding capabilities provided to users in such settings.
- [565] arXiv:2405.07819 [pdf, ps, other]
-
Title: Local Adjoints for Simultaneous Preaccumulations with Shared InputsComments: 9 pages, 4 figures, 1 tableSubjects: Mathematical Software (cs.MS)
In shared-memory parallel automatic differentiation, shared inputs among simultaneous thread-local preaccumulations lead to data races if Jacobians are accumulated with a single, shared vector of adjoint variables. In this work, we discuss the benefits and tradeoffs of re-enabling such preaccumulations by a transition to suitable local adjoint variables. In particular, we assess the performance of mapped local adjoints in discrete adjoint computations in the multiphysics simulation suite SU2.
- [566] arXiv:2405.07822 [pdf, ps, other]
-
Title: Synthetic Tabular Data Validation: A Divergence-Based ApproachComments: 15 pages, 14 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The ever-increasing use of generative models in various fields where tabular data is used highlights the need for robust and standardized validation metrics to assess the similarity between real and synthetic data. Current methods lack a unified framework and rely on diverse and often inconclusive statistical measures. Divergences, which quantify discrepancies between data distributions, offer a promising avenue for validation. However, traditional approaches calculate divergences independently for each feature due to the complexity of joint distribution modeling. This paper addresses this challenge by proposing a novel approach that uses divergence estimation to overcome the limitations of marginal comparisons. Our core contribution lies in applying a divergence estimator to build a validation metric considering the joint distribution of real and synthetic data. We leverage a probabilistic classifier to approximate the density ratio between datasets, allowing the capture of complex relationships. We specifically calculate two divergences: the well-known Kullback-Leibler (KL) divergence and the Jensen-Shannon (JS) divergence. KL divergence offers an established use in the field, while JS divergence is symmetric and bounded, providing a reliable metric. The efficacy of this approach is demonstrated through a series of experiments with varying distribution complexities. The initial phase involves comparing estimated divergences with analytical solutions for simple distributions, setting a benchmark for accuracy. Finally, we validate our method on a real-world dataset and its corresponding synthetic counterpart, showcasing its effectiveness in practical applications. This research offers a significant contribution with applicability beyond tabular data and the potential to improve synthetic data validation in various fields.
- [567] arXiv:2405.07823 [pdf, ps, other]
-
Title: Integrating Multi-Physics Simulations and Machine Learning to Define the Spatter Mechanism and Process Window in Laser Powder Bed FusionSubjects: Machine Learning (cs.LG)
Laser powder bed fusion (LPBF) has shown promise for wide range of applications due to its ability to fabricate freeform geometries and generate a controlled microstructure. However, components generated by LPBF still possess sub-optimal mechanical properties due to the defects that are created during laser-material interactions. In this work, we investigate mechanism of spatter formation, using a high-fidelity modelling tool that was built to simulate the multi-physics phenomena in LPBF. The modelling tool have the capability to capture the 3D resolution of the meltpool and the spatter behavior. To understand spatter behavior and formation, we reveal its properties at ejection and evaluate its variation from the meltpool, the source where it is formed. The dataset of the spatter and the meltpool collected consist of 50 % spatter and 50 % melt pool samples, with features that include position components, velocity components, velocity magnitude, temperature, density and pressure. The relationship between the spatter and the meltpool were evaluated via correlation analysis and machine learning (ML) algorithms for classification tasks. Upon screening different ML algorithms on the dataset, a high accuracy was observed for all the ML models, with ExtraTrees having the highest at 96 % and KNN having the lowest at 94 %.
- [568] arXiv:2405.07826 [pdf, ps, other]
-
Title: A View of How Language Models Will Transform LawSubjects: Computers and Society (cs.CY)
While most commentators have focused exclusively on how LLMs will transform day-to-day law practice, a substantial structural change could be afoot within the legal sector as a whole. Large increases in productivity and attendant cost savings could encourage law firms and corporate legal departments to develop large language models in-house. A ten percent increase in attorney productivity would encourage an average sized 'Big Law' firm to reduce its associate headcount by 300 to 400 lawyers. This represents cost savings of 60 to 120 million dollars - more than enough to pay for the development of a specialized LLM. Eventually, LLMs will push lawyers into highly specialized and nuanced roles. After fully mature LLMs arrive, the lawyer will continue to play a central role in legal practice, but only in non-routine legal tasks. These tasks will primarily involve value judgments, such as the development of precedent or its reversal, or the allocation of property and other scarce resources. This new mix of lawyer-machine labor, where machines primarily carry out routine legal tasks, and lawyers handle the non-routine, will give rise to a growing demand for lawyers who can exercise good judgment and empathize with the winners and losers of social change. Overall, the Article suggests a possible future where there are fewer lawyers and greater consolidation of the legal sector.
- [569] arXiv:2405.07827 [pdf, ps, other]
-
Title: Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable SensorYuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie ZhuComments: Accepted at CVPRw 2024Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%.
- [570] arXiv:2405.07828 [pdf, ps, other]
-
Title: Can LLMs Help Predict Elections? (Counter)Evidence from the World's Largest DemocracySubjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
The study of how social media affects the formation of public opinion and its influence on political results has been a popular field of inquiry. However, current approaches frequently offer a limited comprehension of the complex political phenomena, yielding inconsistent outcomes. In this work, we introduce a new method: harnessing the capabilities of Large Language Models (LLMs) to examine social media data and forecast election outcomes. Our research diverges from traditional methodologies in two crucial respects. First, we utilize the sophisticated capabilities of foundational LLMs, which can comprehend the complex linguistic subtleties and contextual details present in social media data. Second, we focus on data from X (Twitter) in India to predict state assembly election outcomes. Our method entails sentiment analysis of election-related tweets through LLMs to forecast the actual election results, and we demonstrate the superiority of our LLM-based method against more traditional exit and opinion polls. Overall, our research offers valuable insights into the unique dynamics of Indian politics and the remarkable impact of social media in molding public attitudes within this context.
- [571] arXiv:2405.07834 [pdf, ps, other]
-
Title: Adaptive Human-Swarm Interaction based on Workload Measurement using Functional Near-Infrared SpectroscopyAyodeji O. Abioye, Aleksandra Landowska, William Hunt, Horia Maior, Sarvapali D. Ramchurn, Mohammad Naiseh, Alec Banks, Mohammad D. SooratiComments: This paper consist of 3 pages and contains 2 figures. This abstract paper was presented at the "Breaking Swarm Stereotypes" workshop of the 2024 IEEE International Conference on Robotics and Automation (ICRA) in PACIFICO Yokohama, Japan, held from May 13th to 17th, 2024. this https URLSubjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
One of the challenges of human-swarm interaction (HSI) is how to manage the operator's workload. In order to do this, we propose a novel neurofeedback technique for the real-time measurement of workload using functional near-infrared spectroscopy (fNIRS). The objective is to develop a baseline for workload measurement in human-swarm interaction using fNIRS and to develop an interface that dynamically adapts to the operator's workload. The proposed method consists of using fNIRS device to measure brain activity, process this through a machine learning algorithm, and pass it on to the HSI interface. By dynamically adapting the HSI interface, the swarm operator's workload could be reduced and the performance improved.
- [572] arXiv:2405.07836 [pdf, ps, other]
-
Title: Forecasting with Hyper-TreesComments: Forecasting, Gradient Boosting, Hyper-Networks, LightGBM, Parameter Non-Stationarity, Time Series, XGBoostSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
This paper introduces the concept of Hyper-Trees and offers a new direction in applying tree-based models to time series data. Unlike conventional applications of decision trees that forecast time series directly, Hyper-Trees are designed to learn the parameters of a target time series model. Our framework leverages the gradient-based nature of boosted trees, which allows us to extend the concept of Hyper-Networks to Hyper-Trees and to induce a time-series inductive bias to tree models. By relating the parameters of a target time series model to features, Hyper-Trees address the challenge of parameter non-stationarity and enable tree-based forecasts to extend beyond their initial training range. With our research, we aim to explore the effectiveness of Hyper-Trees across various forecasting scenarios and to expand the application of gradient boosted decision trees past their conventional use in time series forecasting.
- [573] arXiv:2405.07838 [pdf, ps, other]
-
Title: Adaptive Exploration for Data-Efficient General Value Function EvaluationsComments: 20 pages, 9 figures, Under ReviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This leaves an open question: how can behavior policy be chosen for data-efficient GVF learning? To address this gap, we propose GVFExplorer, which aims at learning a behavior policy that efficiently gathers data for evaluating multiple GVFs in parallel. This behavior policy selects actions in proportion to the total variance in the return across all GVFs, reducing the number of environmental interactions. To enable accurate variance estimation, we use a recently proposed temporal-difference-style variance estimator. We prove that each behavior policy update reduces the mean squared error in the summed predictions over all GVFs. We empirically demonstrate our method's performance in both tabular representations and nonlinear function approximation.
- [574] arXiv:2405.07839 [pdf, ps, other]
-
Title: Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin DynamicsComments: 28 pages, 13 figures, to appear in ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Replica exchange stochastic gradient Langevin dynamics (reSGLD) is an effective sampler for non-convex learning in large-scale datasets. However, the simulation may encounter stagnation issues when the high-temperature chain delves too deeply into the distribution tails. To tackle this issue, we propose reflected reSGLD (r2SGLD): an algorithm tailored for constrained non-convex exploration by utilizing reflection steps within a bounded domain. Theoretically, we observe that reducing the diameter of the domain enhances mixing rates, exhibiting a \emph{quadratic} behavior. Empirically, we test its performance through extensive experiments, including identifying dynamical systems with physical constraints, simulations of constrained multi-modal distributions, and image classification tasks. The theoretical and empirical findings highlight the crucial role of constrained exploration in improving the simulation efficiency.
- [575] arXiv:2405.07840 [pdf, ps, other]
-
Title: Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLMSubjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. However, many existing efforts concentrate on decoding small vocabulary sets, leaving space for the exploration of open vocabulary continuous text decoding. In this paper, we introduce a novel method, the \textbf{Brain Prompt GPT (BP-GPT)}. By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce a text-to-text baseline and align the fMRI prompt to the text prompt. By introducing the text-to-text baseline, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to $4.61\%$ on METEOR and $2.43\%$ on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective.
- [576] arXiv:2405.07841 [pdf, ps, other]
-
Title: Sample Selection Bias in Machine Learning for HealthcareVinod Kumar Chauhan, Lei Clifton, Achille Salaün, Huiqi Yvonne Lu, Kim Branson, Patrick Schwab, Gaurav Nigam, David A. CliftonComments: 20 pages and 11 figures (under review)Subjects: Machine Learning (cs.LG)
While machine learning algorithms hold promise for personalised medicine, their clinical adoption remains limited. One critical factor contributing to this restraint is sample selection bias (SSB) which refers to the study population being less representative of the target population, leading to biased and potentially harmful decisions. Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare. Moreover, the existing techniques try to correct the bias by balancing distributions between the study and the target populations, which may result in a loss of predictive performance. To address these problems, our study illustrates the potential risks associated with SSB by examining SSB's impact on the performance of machine learning algorithms. Most importantly, we propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction. Specifically, we propose two independent networks (T-Net) and a multitasking network (MT-Net) for addressing SSB, where one network/task identifies the target subpopulation which is representative of the study population and the second makes predictions for the identified subpopulation. Our empirical results with synthetic and semi-synthetic datasets highlight that SSB can lead to a large drop in the performance of an algorithm for the target population as compared with the study population, as well as a substantial difference in the performance for the target subpopulations that are representative of the selected and the non-selected patients from the study population. Furthermore, our proposed techniques demonstrate robustness across various settings, including different dataset sizes, event rates, and selection rates, outperforming the existing bias correction techniques.
- [577] arXiv:2405.07845 [pdf, ps, other]
-
Title: Multi-Task Learning for Fatigue Detection and Face Recognition of Drivers via Tree-Style Space-Channel Attention Fusion NetworkSubjects: Computer Vision and Pattern Recognition (cs.CV)
In driving scenarios, automobile active safety systems are increasingly incorporating deep learning technology. These systems typically need to handle multiple tasks simultaneously, such as detecting fatigue driving and recognizing the driver's identity. However, the traditional parallel-style approach of combining multiple single-task models tends to waste resources when dealing with similar tasks. Therefore, we propose a novel tree-style multi-task modeling approach for multi-task learning, which rooted at a shared backbone, more dedicated separate module branches are appended as the model pipeline goes deeper. Following the tree-style approach, we propose a multi-task learning model for simultaneously performing driver fatigue detection and face recognition for identifying a driver. This model shares a common feature extraction backbone module, with further separated feature extraction and classification module branches. The dedicated branches exploit and combine spatial and channel attention mechanisms to generate space-channel fused-attention enhanced features, leading to improved detection performance. As only single-task datasets are available, we introduce techniques including alternating updation and gradient accumulation for training our multi-task model using only the single-task datasets. The effectiveness of our tree-style multi-task learning model is verified through extensive validations.
- [578] arXiv:2405.07847 [pdf, ps, other]
-
Title: SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene ModelingSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We present SceneFactory, a workflow-centric and unified framework for incremental scene modeling, that supports conveniently a wide range of applications, such as (unposed and/or uncalibrated) multi-view depth estimation, LiDAR completion, (dense) RGB-D/RGB-L/Mono//Depth-only reconstruction and SLAM. The workflow-centric design uses multiple blocks as the basis for building different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks in SceneFactory: (1) Mono-SLAM, (2) depth estimation, (3) flexion and (4) scene reconstruction. Furthermore, we propose an unposed & uncalibrated multi-view depth estimation model (U2-MVD) to estimate dense geometry. U2-MVD exploits dense bundle adjustment for solving for poses, intrinsics, and inverse depth. Then a semantic-awared ScaleCov step is introduced to complete the multi-view depth. Relying on U2-MVD, SceneFactory both supports user-friendly 3D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high quality surface and color reconstruction, we propose due-purpose Multi-resolutional Neural Points (DM-NPs) for the first surface accessible Surface Color Field design, where we introduce Improved Point Rasterization (IPR) for point cloud based surface query.
We implement and experiment with SceneFactory to demonstrate its broad practicability and high flexibility. Its quality also competes or exceeds the tightly-coupled state of the art approaches in all tasks. We contribute the code to the community (this https URL). - [579] arXiv:2405.07848 [pdf, ps, other]
-
Title: Positional-Unigram Byte Models for Generalized TLS FingerprintingSubjects: Cryptography and Security (cs.CR)
We use positional-unigram byte models along with maximum likelihood for generalized TLS fingerprinting and empirically show that it is robust to cipher stunting. Our approach creates a set of positional-unigram byte models from client hello messages. Each positional-unigram byte model is a statistical model of TLS client hello traffic created by a client application or process. To fingerprint a TLS connection, we use its client hello, and compute the likelihood as a function of a statistical model. The statistical model that maximizes the likelihood function is the predicted client application for the given client hello. Our data driven approach does not use side-channel information and can be updated on-the-fly. We experimentally validate our method on an internal dataset and show that it is robust to cipher stunting by tracking an unbiased $f_{1}$ score as we synthetically increase randomization.
- [580] arXiv:2405.07850 [pdf, ps, other]
-
Title: Knowledge Graph Embedding in Intent-Based NetworkingComments: Accepted at WIN 2024 (IEEE NetSoft24)Subjects: Networking and Internet Architecture (cs.NI)
This paper presents a novel approach to network management by integrating intent-based networking (IBN) with knowledge graphs (KGs), creating a more intuitive and efficient pipeline for service orchestration. By mapping high-level business intents onto network configurations using KGs, the system dynamically adapts to network changes and service demands, ensuring optimal performance and resource allocation. We utilize knowledge graph embedding (KGE) to acquire context information from the network and service providers. The KGE model is trained using a custom KG and Gaussian embedding model and maps intents to services via service prediction and intent validation processes. The proposed intent lifecycle enables intent translation and assurance by only deploying validated intents according to network and resource availability. We evaluate the trained model for its efficiency in service mapping and intent validation tasks using simulated environments and extensive experiments. The service prediction and intent verification accuracy greater than 80 percent is achieved for the trained KGE model on a custom service orchestration intent knowledge graph (IKG) based on TMForum's intent common model.
- [581] arXiv:2405.07857 [pdf, ps, other]
-
Title: Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse InputsComments: ICML2024 ; Project page is accessible at this https URL ; Code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The multi-plane representation has been highlighted for its fast training and inference across static and dynamic neural radiance fields. This approach constructs relevant features via projection onto learnable grids and interpolating adjacent vertices. However, it has limitations in capturing low-frequency details and tends to overuse parameters for low-frequency features due to its bias toward fine details, despite its multi-resolution concept. This phenomenon leads to instability and inefficiency when training poses are sparse. In this work, we propose a method that synergistically integrates multi-plane representation with a coordinate-based network known for strong bias toward low-frequency signals. The coordinate-based network is responsible for capturing low-frequency details, while the multi-plane representation focuses on capturing fine-grained details. We demonstrate that using residual connections between them seamlessly preserves their own inherent properties. Additionally, the proposed progressive training scheme accelerates the disentanglement of these two features. We empirically show that the proposed method achieves comparable results to explicit encoding with fewer parameters, and particularly, it outperforms others for the static and dynamic NeRFs under sparse inputs.
- [582] arXiv:2405.07863 [pdf, ps, other]
-
Title: RLHF Workflow: From Reward Modeling to Online RLHFHanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong ZhangComments: 26 pages, 8 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. However, existing open-source RLHF projects are still largely confined to the offline learning setting. In this technical report, we aim to fill in this gap and provide a detailed recipe that is easy to reproduce for online iterative RLHF. In particular, since online human feedback is usually infeasible for open-source communities with limited resources, we start by constructing preference models using a diverse set of open-source datasets and use the constructed proxy preference model to approximate human feedback. Then, we discuss the theoretical insights and algorithmic principles behind online iterative RLHF, followed by a detailed practical implementation. Our trained LLM, SFR-Iterative-DPO-LLaMA-3-8B-R, achieves impressive performance on LLM chatbot benchmarks, including AlpacaEval-2, Arena-Hard, and MT-Bench, as well as other academic benchmarks such as HumanEval and TruthfulQA. We have shown that supervised fine-tuning (SFT) and iterative RLHF can obtain state-of-the-art performance with fully open-source datasets. Further, we have made our models, curated datasets, and comprehensive step-by-step code guidebooks publicly available. Please refer to this https URL and this https URL for more detailed information.
- [583] arXiv:2405.07865 [pdf, ps, other]
-
Title: AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous DrivingDaniel Bogdoll, Iramm Hamdard, Lukas Namgyu Rößler, Felix Geisler, Muhammed Bayram, Felix Wang, Jan Imhof, Miguel de Campos, Anushervon Tabarov, Yitian Yang, Hanno Gottschalk, J. Marius ZöllnerComments: Daniel Bogdoll, Iramm Hamdard, and Lukas Namgyu Rößler contributed equallySubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
The scale-up of autonomous vehicles depends heavily on their ability to deal with anomalies, such as rare objects on the road. In order to handle such situations, it is necessary to detect anomalies in the first place. Anomaly detection for autonomous driving has made great progress in the past years but suffers from poorly designed benchmarks with a strong focus on camera data. In this work, we propose AnoVox, the largest benchmark for ANOmaly detection in autonomous driving to date. AnoVox incorporates large-scale multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. We propose a formal definition of normality and provide a compliant training dataset. AnoVox is the first benchmark to contain both content and temporal anomalies.
- [584] arXiv:2405.07868 [pdf, ps, other]
-
Title: Boostlet.js: Image processing plugins for the web via JavaScript injectionEdward Gaibor, Shruti Varade, Rohini Deshmukh, Tim Meyer, Mahsa Geshvadi, SangHyuk Kim, Vidhya Sree Narayanappa, Daniel HaehnComments: 5 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Can web-based image processing and visualization tools easily integrate into existing websites without significant time and effort? Our Boostlet.js library addresses this challenge by providing an open-source, JavaScript-based web framework to enable additional image processing functionalities. Boostlet examples include kernel filtering, image captioning, data visualization, segmentation, and web-optimized machine-learning models. To achieve this, Boostlet.js uses a browser bookmark to inject a user-friendly plugin selection tool called PowerBoost into any host website. Boostlet also provides on-site access to a standard API independent of any visualization framework for pixel data and scene manipulation. Web-based Boostlets provide a modular architecture and client-side processing capabilities to apply advanced image-processing techniques using consumer-level hardware. The code is open-source and available.
- [585] arXiv:2405.07870 [pdf, ps, other]
-
Title: Mapping the Invisible: A Framework for Tracking COVID-19 Spread Among College Students with Google Location DataComments: 8 pagesJournal-ref: Latin American Workshop on Data Fusion (LAFUSION 2023), November/2023, pp 1-8, Rio de Janeiro, BrazilSubjects: Software Engineering (cs.SE)
The COVID-19 pandemic and the implementation of social distancing policies have rapidly changed people's visiting patterns, as reflected in mobility data that tracks mobility traffic using location trackers on cell phones. However, the frequency and duration of concurrent occupancy at specific locations govern the transmission rather than the number of customers visiting. Therefore, understanding how people interact in different locations is crucial to target policies, inform contact tracing, and prevention strategies. This study proposes an efficient way to reduce the spread of the virus among on-campus university students by developing a self-developed Google History Location Extractor and Indicator software based on real-world human mobility data. The platform enables policymakers and researchers to explore the possibility of future developments in the epidemic's spread and simulate the outcomes of human mobility and epidemic state under different epidemic control policies. It offers functions for determining potential contacts, assessing individual infection risks, and evaluating the effectiveness of on-campus policies. The proposed multi-functional platform facilitates the screening process by more accurately targeting potential virus carriers and aids in making informed decisions on epidemic control policies, ultimately contributing to preventing and managing future outbreaks.
- [586] arXiv:2405.07875 [pdf, ps, other]
-
Title: Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation TechniquesComments: The Fourth Workshop on Human Evaluation of NLP Systems (HumEval 2024) at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
Rerunning a metric-based evaluation should be more straightforward, and results should be closer, than in a human-based evaluation, especially where code and model checkpoints are made available by the original authors. As this report of our efforts to rerun a metric-based evaluation of a set of single-attribute and multiple-attribute controllable text generation (CTG) techniques shows however, such reruns of evaluations do not always produce results that are the same as the original results, and can reveal errors in the reporting of the original work.
- [587] arXiv:2405.07877 [pdf, ps, other]
-
Title: Optimal accuracy for linear sets of equations with the graph LaplacianSubjects: Numerical Analysis (math.NA); Social and Information Networks (cs.SI); Computation (stat.CO)
We show that certain Graph Laplacian linear sets of equations exhibit optimal accuracy, guaranteeing that the relative error is no larger than the norm of the relative residual and that optimality occurs for carefully chosen right-hand sides. Such sets of equations arise in PageRank and Markov chain theory. We establish new relationships among the PageRank teleportation parameter, the Markov chain discount, and approximations to linear sets of equations. The set of optimally accurate systems can be separated into two groups for an undirected graph -- those that achieve optimality asymptotically with the graph size and those that do not -- determined by the angle between the right-hand side of the linear system and the vector of all ones. We provide supporting numerical experiments.
- [588] arXiv:2405.07883 [pdf, ps, other]
-
Title: Zero-Shot Tokenizer TransferSubjects: Computation and Language (cs.CL)
Language models (LMs) are bound to their tokenizer, which maps raw text to a sequence of vocabulary items (tokens). This restricts their flexibility: for example, LMs trained primarily on English may still perform well in other natural and programming languages, but have vastly decreased efficiency due to their English-centric tokenizer. To mitigate this, we should be able to swap the original LM tokenizer with an arbitrary one, on the fly, without degrading performance. Hence, in this work we define a new problem: Zero-Shot Tokenizer Transfer (ZeTT). The challenge at the core of ZeTT is finding embeddings for the tokens in the vocabulary of the new tokenizer. Since prior heuristics for initializing embeddings often perform at chance level in a ZeTT setting, we propose a new solution: we train a hypernetwork taking a tokenizer as input and predicting the corresponding embeddings. We empirically demonstrate that the hypernetwork generalizes to new tokenizers both with encoder (e.g., XLM-R) and decoder LLMs (e.g., Mistral-7B). Our method comes close to the original models' performance in cross-lingual and coding tasks while markedly reducing the length of the tokenized sequence. We also find that the remaining gap can be quickly closed by continued training on less than 1B tokens. Finally, we show that a ZeTT hypernetwork trained for a base (L)LM can also be applied to fine-tuned variants without extra training. Overall, our results make substantial strides toward detaching LMs from their tokenizer.
- [589] arXiv:2405.07884 [pdf, ps, other]
-
Title: Lai Loss: A Novel Loss Integrating RegularizationComments: 7 pages, 7 figuresSubjects: Machine Learning (cs.LG)
In the field of machine learning, traditional regularization methods generally tend to directly add regularization terms to the loss function. This paper introduces the "Lai loss", a novel loss design that integrates the regularization terms (gradient component) into the traditional loss function through a straightforward geometric ideation. This design innovatively penalizes the gradient vectors through the loss, effectively controlling the model's smoothness and offering the dual benefits of reducing overfitting and avoiding underfitting. Subsequently, we proposed a random sampling method that successfully addresses the challenges associated with its application under large sample conditions. We conducted preliminary experiments using publicly available datasets from Kaggle, demonstrating that the design of Lai loss can control the model's smoothness while ensuring maximum accuracy.
- [590] arXiv:2405.07886 [pdf, ps, other]
-
Title: Russian-Language Multimodal Dataset for Automatic Summarization of Scientific PapersComments: 12 pages, accepted to AINLSubjects: Computation and Language (cs.CL)
The paper discusses the creation of a multimodal dataset of Russian-language scientific papers and testing of existing language models for the task of automatic text summarization. A feature of the dataset is its multimodal data, which includes texts, tables and figures. The paper presents the results of experiments with two language models: Gigachat from SBER and YandexGPT from Yandex. The dataset consists of 420 papers and is publicly available on this https URL.
- [591] arXiv:2405.07887 [pdf, ps, other]
-
Title: A Second-Order Audio VCO-ADC with 103-dB-A Dynamic Range and Binary-Weighted Internal ArchitectureComments: 11 pages, 15 figuresSubjects: Systems and Control (eess.SY)
One of the limitations of conventional VCO-ADCs is the restriction to first-order noise shaping. True-VCO architectures have been proposed to increase the noise-shaping order by cascading several VCO integrators, but without requiring analog feedback loops. A high noise shaping order allows to reduce the input VCO frequency compared to a conventional VCO-ADC with similar dynamic range, which improves power consumption. Prior-art True-VCO architectures represent state variables either with a thermometer code or with a single-bit. Thermometer encoding is a natural choice when ring oscillators are selected as loop filter integrators. However, chip area restrictions force thermometer-encoded state variables to have few levels. A reduced number of levels in the state variables limits the dynamic range of True VCO-ADCs. In this paper, we show experimentally a second-order audio VCO-based ADC which uses ring oscillators as integrators but employs Gray and binary encoding for state variables. As a consequence, the complexity and area of the True-VCO architecture is reduced, breaking the barrier that limits the dynamic range of prior designs. The implemented chip shows a dynamic range of 103~dB achieving a peak SNDR of 76.5 dB-A with a power of 250 $\mu$W occupying 0.095 $\text{mm}^2$ in 130 nm CMOS.
- [592] arXiv:2405.07892 [pdf, ps, other]
-
Title: All Nodes are created Not Equal: Node-Specific Layer Aggregation and Filtration for GNNShilong Wang, Hao Wu, Yifan Duan, Guibin Zhang, Guohao Li, Yuxuan Liang, Shirui Pan, Kun Wang, Yang WangSubjects: Machine Learning (cs.LG)
The ever-designed Graph Neural Networks, though opening a promising path for the modeling of the graph-structure data, unfortunately introduce two daunting obstacles to their deployment on devices. (I) Most of existing GNNs are shallow, due mostly to the over-smoothing and gradient-vanish problem as they go deeper as convolutional architectures. (II) The vast majority of GNNs adhere to the homophily assumption, where the central node and its adjacent nodes share the same label. This assumption often poses challenges for many GNNs working with heterophilic graphs. Addressing the aforementioned issue has become a looming challenge in enhancing the robustness and scalability of GNN applications. In this paper, we take a comprehensive and systematic approach to overcoming the two aforementioned challenges for the first time. We propose a Node-Specific Layer Aggregation and Filtration architecture, termed NoSAF, a framework capable of filtering and processing information from each individual nodes. NoSAF introduces the concept of "All Nodes are Created Not Equal" into every layer of deep networks, aiming to provide a reliable information filter for each layer's nodes to sieve out information beneficial for the subsequent layer. By incorporating a dynamically updated codebank, NoSAF dynamically optimizes the optimal information outputted downwards at each layer. This effectively overcomes heterophilic issues and aids in deepening the network. To compensate for the information loss caused by the continuous filtering in NoSAF, we also propose NoSAF-D (Deep), which incorporates a compensation mechanism that replenishes information in every layer of the model, allowing NoSAF to perform meaningful computations even in very deep layers.
- [593] arXiv:2405.07893 [pdf, ps, other]
-
Title: Science based AI model certification for new operational environments with application in traffic state estimationComments: 7 Pages, 5 figures, \c{opyright}2024 IEEE INTERNATIONAL CONFERENCE on ELECTRO/INFORMATION TECHNOLOGYSubjects: Artificial Intelligence (cs.AI)
The expanding role of Artificial Intelligence (AI) in diverse engineering domains highlights the challenges associated with deploying AI models in new operational environments, involving substantial investments in data collection and model training. Rapid application of AI necessitates evaluating the feasibility of utilizing pre-trained models in unobserved operational settings with minimal or no additional data. However, interpreting the opaque nature of AI's black-box models remains a persistent challenge. Addressing this issue, this paper proposes a science-based certification methodology to assess the viability of employing pre-trained data-driven models in new operational environments. The methodology advocates a profound integration of domain knowledge, leveraging theoretical and analytical models from physics and related disciplines, with data-driven AI models. This novel approach introduces tools to facilitate the development of secure engineering systems, providing decision-makers with confidence in the trustworthiness and safety of AI-based models across diverse environments characterized by limited training data and dynamic, uncertain conditions. The paper demonstrates the efficacy of this methodology in real-world safety-critical scenarios, particularly in the context of traffic state estimation. Through simulation results, the study illustrates how the proposed methodology efficiently quantifies physical inconsistencies exhibited by pre-trained AI models. By utilizing analytical models, the methodology offers a means to gauge the applicability of pre-trained AI models in new operational environments. This research contributes to advancing the understanding and deployment of AI models, offering a robust certification framework that enhances confidence in their reliability and safety across a spectrum of operational conditions.
- [594] arXiv:2405.07896 [pdf, ps, other]
-
Title: Almanac Copilot: Towards Autonomous Electronic Health Record NavigationCyril Zakka, Joseph Cho, Gracia Fahed, Rohan Shad, Michael Moor, Robyn Fong, Dhamanpreet Kaur, Vishnu Ravi, Oliver Aalami, Roxana Daneshjou, Akshay Chaudhari, William HiesingerSubjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Clinicians spend large amounts of time on clinical documentation, and inefficiencies impact quality of care and increase clinician burnout. Despite the promise of electronic medical records (EMR), the transition from paper-based records has been negatively associated with clinician wellness, in part due to poor user experience, increased burden of documentation, and alert fatigue. In this study, we present Almanac Copilot, an autonomous agent capable of assisting clinicians with EMR-specific tasks such as information retrieval and order placement. On EHR-QA, a synthetic evaluation dataset of 300 common EHR queries based on real patient data, Almanac Copilot obtains a successful task completion rate of 74% (n = 221 tasks) with a mean score of 2.45 over 3 (95% CI:2.34-2.56). By automating routine tasks and streamlining the documentation process, our findings highlight the significant potential of autonomous agents to mitigate the cognitive load imposed on clinicians by current EMR systems.
- [595] arXiv:2405.07901 [pdf, ps, other]
-
Title: Physically Consistent Online Inertial Adaptation for Humanoid Loco-manipulationComments: Submitted to the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Subjects: Robotics (cs.RO)
The ability to accomplish manipulation and locomotion tasks in the presence of significant time-varying external loads is a remarkable skill of humans that has yet to be replicated convincingly by humanoid robots. Such an ability will be a key requirement in the environments we envision deploying our robots: dull, dirty, and dangerous. External loads constitute a large model bias, which is typically unaccounted for. In this work, we enable our humanoid robot to engage in loco-manipulation tasks in the presence of significant model bias due to external loads. We propose an online estimation and control framework involving the combination of a physically consistent extended Kalman filter for inertial parameter estimation coupled to a whole-body controller. We showcase our results both in simulation and in hardware, where weights are mounted on Nadia's wrist links as a proxy for engaging in tasks where large external loads are applied to the robot.
- [596] arXiv:2405.07908 [pdf, ps, other]
-
Title: Collaborative Planar Pushing of Polytopic Objects with Multiple Robots in Complex ScenesComments: 16 pages, 23 figuresSubjects: Robotics (cs.RO)
Pushing is a simple yet effective skill for robots to interact with and further change the environment. Related work has been mostly focused on utilizing it as a non-prehensile manipulation primitive for a robotic manipulator. However, it can also be beneficial for low-cost mobile robots that are not equipped with a manipulator. This work tackles the general problem of controlling a team of mobile robots to push collaboratively polytopic objects within complex obstacle-cluttered environments. It incorporates several characteristic challenges for contact-rich tasks such as the hybrid switching among different contact modes and under-actuation due to constrained contact forces. The proposed method is based on hybrid optimization over a sequence of possible modes and the associated pushing forces, where (i) a set of sufficient modes is generated with a multi-directional feasibility estimation, based on quasi-static analyses for general objects and any number of robots; (ii) a hierarchical hybrid search algorithm is designed to iteratively decompose the navigation path via arch segments and select the optimal parameterized mode; and (iii) a nonlinear model predictive controller is proposed to track the desired pushing velocities adaptively online for each robot. The proposed framework is complete under mild assumptions. Its efficiency and effectiveness are validated in high-fidelity simulations and hardware experiments. Robustness to motion and actuation uncertainties is also demonstrated.
- [597] arXiv:2405.07911 [pdf, ps, other]
-
Title: Slice closures of indexed languages and word equations with counting constraintsComments: 12 pages, accepted for publication at LICS 2024Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO); Group Theory (math.GR)
Indexed languages are a classical notion in formal language theory. As the language equivalent of second-order pushdown automata, they have received considerable attention in higher-order model checking. Unfortunately, counting properties are notoriously difficult to decide for indexed languages: So far, all results about non-regular counting properties show undecidability.
In this paper, we initiate the study of slice closures of (Parikh images of) indexed languages. A slice is a set of vectors of natural numbers such that membership of $u,u+v,u+w$ implies membership of $u+v+w$. Our main result is that given an indexed language $L$, one can compute a semilinear representation of the smallest slice containing $L$'s Parikh image.
We present two applications. First, one can compute the set of all affine relations satisfied by the Parikh image of an indexed language. In particular, this answers affirmatively a question by Kobayashi: Is it decidable whether in a given indexed language, every word has the same number of $a$'s as $b$'s.
As a second application, we show decidability of (systems of) word equations with rational constraints and a class of counting constraints: These allow us to look for solutions where a counting function (defined by an automaton) is not zero. For example, one can decide whether a word equation with rational constraints has a solution where the number of occurrences of $a$ differs between variables $X$ and $Y$. - [598] arXiv:2405.07913 [pdf, ps, other]
-
Title: CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to consider detailed forms of conditioning reflecting style and/or structure information remains an open problem. In this paper, we present LoRAdapter, an approach that unifies both style and structure conditioning under the same formulation using a novel conditional LoRA block that enables zero-shot control. LoRAdapter is an efficient, powerful, and architecture-agnostic approach to condition text-to-image diffusion models, which enables fine-grained control conditioning during generation and outperforms recent state-of-the-art approaches
- [599] arXiv:2405.07914 [pdf, ps, other]
-
Title: Distribution Learning Meets Graph Structure SamplingComments: 48 pages, 2 figures. Shortened abstract as per arXiv criteriaSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an online learning framework.
We observe that if we apply the exponentially weighted average (EWA) or randomized weighted majority (RWM) forecasters on a sequence of samples from a distribution P using the log loss function, the average regret incurred by the forecaster's predictions can be used to bound the expected KL divergence between P and the predictions. Known regret bounds for EWA and RWM then yield new sample complexity bounds for learning Bayes nets. Moreover, these algorithms can be made computationally efficient for several interesting classes of Bayes nets. Specifically, we give a new sample-optimal and polynomial time learning algorithm with respect to trees of unknown structure and the first polynomial sample and time algorithm for learning with respect to Bayes nets over a given chordal skeleton. - [600] arXiv:2405.07916 [pdf, ps, other]
-
Title: IMAFD: An Interpretable Multi-stage Approach to Flood Detection from time series Multispectral DataSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In this paper, we address two critical challenges in the domain of flood detection: the computational expense of large-scale time series change detection and the lack of interpretable decision-making processes on explainable AI (XAI). To overcome these challenges, we proposed an interpretable multi-stage approach to flood detection, IMAFD has been proposed. It provides an automatic, efficient and interpretable solution suitable for large-scale remote sensing tasks and offers insight into the decision-making process. The proposed IMAFD approach combines the analysis of the dynamic time series image sequences to identify images with possible flooding with the static, within-image semantic segmentation. It combines anomaly detection (at both image and pixel level) with semantic segmentation. The flood detection problem is addressed through four stages: (1) at a sequence level: identifying the suspected images (2) at a multi-image level: detecting change within suspected images (3) at an image level: semantic segmentation of images into Land, Water or Cloud class (4) decision making. Our contributions are two folder. First, we efficiently reduced the number of frames to be processed for dense change detection by providing a multi-stage holistic approach to flood detection. Second, the proposed semantic change detection method (stage 3) provides human users with an interpretable decision-making process, while most of the explainable AI (XAI) methods provide post hoc explanations. The evaluation of the proposed IMAFD framework was performed on three datasets, WorldFloods, RavAEn and MediaEval. For all the above datasets, the proposed framework demonstrates a competitive performance compared to other methods offering also interpretability and insight.
- [601] arXiv:2405.07917 [pdf, ps, other]
-
Title: High-level Stream Processing: A Complementary Analysis of Fault RecoveryComments: Extended paper version. arXiv admin note: substantial text overlap with arXiv:2404.06203Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Software Engineering (cs.SE)
Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software architectural style. Several software systems rely on stream processing to deliver scalable performance, whereas open-source frameworks provide coding abstraction and high-level parallel computing. Although stream processing's performance is being extensively studied, the measurement of fault tolerance--a key abstraction offered by stream processing frameworks--has still not been adequately measured with comprehensive testbeds. In this work, we extend the previous fault recovery measurements with an exploratory analysis of the configuration space, additional experimental measurements, and analysis of improvement opportunities. We focus on robust deployment setups inspired by requirements for near real-time analytics of a large cloud observability platform. The results indicate significant potential for improving fault recovery and performance. However, these improvements entail grappling with configuration complexities, particularly in identifying and selecting the configurations to be fine-tuned and determining the appropriate values for them. Therefore, new abstractions for transparent configuration tuning are also needed for large-scale industry setups. We believe that more software engineering efforts are needed to provide insights into potential abstractions and how to achieve them. The stream processing community and industry practitioners could also benefit from more interactions with the high-level parallel programming community, whose expertise and insights on making parallel programming more productive and efficient could be extended.
- [602] arXiv:2405.07919 [pdf, ps, other]
-
Title: Exploring the Low-Pass Filtering Behavior in Image Super-ResolutionComments: Accepted by ICML 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep neural networks for image super-resolution have shown significant advantages over traditional approaches like interpolation. However, they are often criticized as `black boxes' compared to traditional approaches which have solid mathematical foundations. In this paper, we attempt to interpret the behavior of deep neural networks using theories from signal processing theories. We first report an intriguing phenomenon, referred to as `the sinc phenomenon,' which occurs when an impulse input is fed to a neural network. Building on this observation, we propose a method named Hybird Response Analysis (HyRA) to analyze the behavior of neural networks in image super-resolution tasks. In details, HyRA decomposes a neural network into a parallel connection of a linear system and a non-linear system, demonstrating that the linear system functions as a low-pass filter, while the non-linear system injects high-frequency information. Furthermore, to quantify the injected high-frequency information, we introduce a metric for image-to-image tasks called Frequency Spectrum Distribution Similarity (FSDS). FSDS reflects the distribution similarity of different frequency components, capturing nuances that traditional metrics may overlook. Code for this work can be found in: this https URL.
- [603] arXiv:2405.07920 [pdf, ps, other]
-
Title: A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-rankingFerdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias HagenSubjects: Information Retrieval (cs.IR)
Cross-encoders distilled from large language models are more effective re-rankers than cross-encoders fine-tuned using manually labeled data. However, the distilled models do not reach the language model's effectiveness. We construct and release a new distillation dataset, named Rank-DistiLLM, to investigate whether insights from fine-tuning cross-encoders on manually labeled data -- hard-negative sampling, deep sampling, and listwise loss functions -- are transferable to large language model ranker distillation. Our dataset can be used to train cross-encoders that reach the effectiveness of large language models while being orders of magnitude more efficient. Code and data is available at: this https URL
- [604] arXiv:2405.07921 [pdf, ps, other]
-
Title: Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?Hari Chandana Kuchibhotla, Sai Srinivas Kancheti, Abbavaram Gowtham Reddy, Vineeth N BalasubramanianSubjects: Computer Vision and Pattern Recognition (cs.CV)
Going beyond mere fine-tuning of vision-language models (VLMs), learnable prompt tuning has emerged as a promising, resource-efficient alternative. Despite their potential, effectively learning prompts faces the following challenges: (i) training in a low-shot scenario results in overfitting, limiting adaptability and yielding weaker performance on newer classes or datasets; (ii) prompt-tuning's efficacy heavily relies on the label space, with decreased performance in large class spaces, signaling potential gaps in bridging image and class concepts. In this work, we ask the question if better text semantics can help address these concerns. In particular, we introduce a prompt-tuning method that leverages class descriptions obtained from large language models (LLMs). Our approach constructs part-level description-guided views of both image and text features, which are subsequently aligned to learn more generalizable prompts. Our comprehensive experiments, conducted across 11 benchmark datasets, outperform established methods, demonstrating substantial improvements.
- [605] arXiv:2405.07922 [pdf, ps, other]
-
Title: Unfolding via Progressive Mesh ApproximationComments: 9 pagesSubjects: Graphics (cs.GR)
When folding a 3D object from a 2D material like paper, typically only an approximation of the original surface geometry is needed. Such an approximation can effectively be created by a (progressive) mesh simplification approach, e.g. using an edge collapse technique. Moreover, when searching for an unfolding of the object, this approximation is assumed to be fixed. In this work, we take a different route and allow the approximation to change while searching for an unfolding. This way, we increase the chances to overcome possible ununfoldability issues. To join the two concepts of mesh approximation and unfolding, our work combines the edge collapsing mesh simplification technique with a Tabu Unfolder, a robust mesh unfolding approach. We empirically show that this strategy performs faster and that it is more reliable than prior state of the art methods.
- [606] arXiv:2405.07925 [pdf, ps, other]
-
Title: Stable Diffusion-based Data Augmentation for Federated Learning with Non-IID DataComments: International Workshop on Federated Foundation Models for the Web 2024 (FL@FM-TheWebConf'24)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
The proliferation of edge devices has brought Federated Learning (FL) to the forefront as a promising paradigm for decentralized and collaborative model training while preserving the privacy of clients' data. However, FL struggles with a significant performance reduction and poor convergence when confronted with Non-Independent and Identically Distributed (Non-IID) data distributions among participating clients. While previous efforts, such as client drift mitigation and advanced server-side model fusion techniques, have shown some success in addressing this challenge, they often overlook the root cause of the performance reduction - the absence of identical data accurately mirroring the global data distribution among clients. In this paper, we introduce Gen-FedSD, a novel approach that harnesses the powerful capability of state-of-the-art text-to-image foundation models to bridge the significant Non-IID performance gaps in FL. In Gen-FedSD, each client constructs textual prompts for each class label and leverages an off-the-shelf state-of-the-art pre-trained Stable Diffusion model to synthesize high-quality data samples. The generated synthetic data is tailored to each client's unique local data gaps and distribution disparities, effectively making the final augmented local data IID. Through extensive experimentation, we demonstrate that Gen-FedSD achieves state-of-the-art performance and significant communication cost savings across various datasets and Non-IID settings.
- [607] arXiv:2405.07930 [pdf, ps, other]
-
Title: Improving Multimodal Learning with Multi-Loss Gradient ModulationSubjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Learning from multiple modalities, such as audio and video, offers opportunities for leveraging complementary information, enhancing robustness, and improving contextual understanding and performance. However, combining such modalities presents challenges, especially when modalities differ in data structure, predictive contribution, and the complexity of their learning processes. It has been observed that one modality can potentially dominate the learning process, hindering the effective utilization of information from other modalities and leading to sub-optimal model performance. To address this issue the vast majority of previous works suggest to assess the unimodal contributions and dynamically adjust the training to equalize them. We improve upon previous work by introducing a multi-loss objective and further refining the balancing process, allowing it to dynamically adjust the learning pace of each modality in both directions, acceleration and deceleration, with the ability to phase out balancing effects upon convergence. We achieve superior results across three audio-video datasets: on CREMA-D, models with ResNet backbone encoders surpass the previous best by 1.9% to 12.4%, and Conformer backbone models deliver improvements ranging from 2.8% to 14.1% across different fusion methods. On AVE, improvements range from 2.7% to 7.7%, while on UCF101, gains reach up to 6.1%.
- [608] arXiv:2405.07932 [pdf, ps, other]
-
Title: PARDEN, Can You Repeat That? Defending against Jailbreaks via RepetitionComments: Accepted at ICML 20224Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models (LLMs) have shown success in many natural language processing tasks. Despite rigorous safety alignment processes, supposedly safety-aligned LLMs like Llama 2 and Claude 2 are still susceptible to jailbreaks, leading to security risks and abuse of the models. One option to mitigate such risks is to augment the LLM with a dedicated "safeguard", which checks the LLM's inputs or outputs for undesired behaviour. A promising approach is to use the LLM itself as the safeguard. Nonetheless, baseline methods, such as prompting the LLM to self-classify toxic content, demonstrate limited efficacy. We hypothesise that this is due to domain shift: the alignment training imparts a self-censoring behaviour to the model ("Sorry I can't do that"), while the self-classify approach shifts it to a classification format ("Is this prompt malicious"). In this work, we propose PARDEN, which avoids this domain shift by simply asking the model to repeat its own outputs. PARDEN neither requires finetuning nor white box access to the model. We empirically verify the effectiveness of our method and show that PARDEN significantly outperforms existing jailbreak detection baselines for Llama-2 and Claude-2. Code and data are available at this https URL.
We find that PARDEN is particularly powerful in the relevant regime of high True Positive Rate (TPR) and low False Positive Rate (FPR). For instance, for Llama2-7B, at TPR equal to 90%, PARDEN accomplishes a roughly 11x reduction in the FPR from 24.8% to 2.0% on the harmful behaviours dataset. - [609] arXiv:2405.07933 [pdf, ps, other]
-
Title: Authentic Hand Avatar from a Phone Scan via Universal Hand ModelComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effective universal hand modeling, we perform tracking and modeling at the same time, while previous 3D hand models perform them separately. The conventional separate pipeline suffers from the accumulated errors from the tracking stage, which cannot be recovered in the modeling stage. On the other hand, ours does not suffer from the accumulated errors while having a much more concise overall pipeline. We additionally introduce a novel image matching loss function to address a skin sliding during the tracking and modeling, while existing works have not focused on it much. Finally, using learned priors from our UHM, we effectively adapt our UHM to each person's short phone scan for the authentic hand avatar.
- [610] arXiv:2405.07937 [pdf, ps, other]
-
Title: Active Learning with Simple QuestionsComments: To appear at COLT 2024Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
We consider an active learning setting where a learner is presented with a pool S of n unlabeled examples belonging to a domain X and asks queries to find the underlying labeling that agrees with a target concept h^* \in H.
In contrast to traditional active learning that queries a single example for its label, we study more general region queries that allow the learner to pick a subset of the domain T \subset X and a target label y and ask a labeler whether h^*(x) = y for every example in the set T \cap S.
Such more powerful queries allow us to bypass the limitations of traditional active learning and use significantly fewer rounds of interactions to learn but can potentially lead to a significantly more complex query language. Our main contribution is quantifying the trade-off between the number of queries and the complexity of the query language used by the learner.
We measure the complexity of the region queries via the VC dimension of the family of regions. We show that given any hypothesis class H with VC dimension d, one can design a region query family Q with VC dimension O(d) such that for every set of n examples S \subset X and every h^* \in H, a learner can submit O(d log n) queries from Q to a labeler and perfectly label S. We show a matching lower bound by designing a hypothesis class H with VC dimension d and a dataset S \subset X of size n such that any learning algorithm using any query class with VC dimension O(d) must make poly(n) queries to label S perfectly.
Finally, we focus on well-studied hypothesis classes including unions of intervals, high-dimensional boxes, and d-dimensional halfspaces, and obtain stronger results. In particular, we design learning algorithms that (i) are computationally efficient and (ii) work even when the queries are not answered based on the learner's pool of examples S but on some unknown superset L of S - [611] arXiv:2405.07938 [pdf, ps, other]
-
Title: EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential ReasoningSubjects: Computation and Language (cs.CL)
In this paper, we introduce EconLogicQA, a rigorous benchmark designed to assess the sequential reasoning capabilities of large language models (LLMs) within the intricate realms of economics, business, and supply chain management. Diverging from traditional benchmarks that predict subsequent events individually, EconLogicQA poses a more challenging task: it requires models to discern and sequence multiple interconnected events, capturing the complexity of economic logics. EconLogicQA comprises an array of multi-event scenarios derived from economic articles, which necessitate an insightful understanding of both temporal and logical event relationships. Through comprehensive evaluations, we exhibit that EconLogicQA effectively gauges a LLM's proficiency in navigating the sequential complexities inherent in economic contexts. We provide a detailed description of EconLogicQA dataset and shows the outcomes from evaluating the benchmark across various leading-edge LLMs, thereby offering a thorough perspective on their sequential reasoning potential in economic contexts. Our benchmark dataset is available at this https URL.
- [612] arXiv:2405.07940 [pdf, ps, other]
-
Title: RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text DetectorsLiam Dugan, Alyssa Hwang, Filip Trhlik, Josh Magnus Ludan, Andrew Zhu, Hainiu Xu, Daphne Ippolito, Chris Callison-BurchComments: To appear at ACL 2024Subjects: Computation and Language (cs.CL)
Many commercial and open-source models claim to detect machine-generated text with very high accuracy (99\% or higher). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging -- lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our dataset and tools to encourage further exploration into detector robustness.
- [613] arXiv:2405.07941 [pdf, ps, other]
-
Title: Efficient and Universal Merkle Tree Inclusion Proofs via OR AggregationOleksandr Kuznetsov, Alex Rusnak, Anton Yezhov, Dzianis Kanonik, Kateryna Kuznetsova, Oleksandr DominSubjects: Cryptography and Security (cs.CR)
Zero-knowledge proofs have emerged as a powerful tool for enhancing privacy and security in blockchain applications. However, the efficiency and scalability of proof systems remain a significant challenge, particularly in the context of Merkle tree inclusion proofs. Traditional proof aggregation techniques based on AND logic suffer from high verification complexity and data communication overhead, limiting their practicality for large-scale applications. In this paper, we propose a novel proof aggregation approach based on OR logic, which enables the generation of compact and universally verifiable proofs for Merkle tree inclusion. By aggregating proofs using OR logic, we achieve a proof size that is independent of the number of leaves in the tree, and verification can be performed using any single valid leaf hash. This represents a significant improvement over AND aggregation, which requires the verifier to process all leaf hashes. We formally define the OR aggregation logic, describe the process of generating universal proofs, and provide a comparative analysis demonstrating the advantages of our approach in terms of proof size, verification data, and universality. Furthermore, we discuss the potential of combining OR and AND aggregation logics to create complex acceptance functions, enabling the development of expressive and efficient proof systems for various blockchain applications. The proposed techniques have the potential to significantly enhance the scalability, efficiency, and flexibility of zero-knowledge proof systems, paving the way for more practical and adaptive solutions in the blockchain ecosystem.
- [614] arXiv:2405.07943 [pdf, ps, other]
-
Title: Hierarchical Decision MambaSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recent advancements in imitation learning have been largely fueled by the integration of sequence models, which provide a structured flow of information to effectively mimic task behaviours. Currently, Decision Transformer (DT) and subsequently, the Hierarchical Decision Transformer (HDT), presented Transformer-based approaches to learn task policies. Recently, the Mamba architecture has shown to outperform Transformers across various task domains. In this work, we introduce two novel methods, Decision Mamba (DM) and Hierarchical Decision Mamba (HDM), aimed at enhancing the performance of the Transformer models. Through extensive experimentation across diverse environments such as OpenAI Gym and D4RL, leveraging varying demonstration data sets, we demonstrate the superiority of Mamba models over their Transformer counterparts in a majority of tasks. Results show that HDM outperforms other methods in most settings. The code can be found at this https URL.
- [615] arXiv:2405.07946 [pdf, ps, other]
-
Title: TPMS2STEP: error-controlled and C2 continuity-preserving translation of TPMS models to STEP files based on constrained-PIASubjects: Computational Geometry (cs.CG)
Triply periodic minimal surface (TPMS) is emerging as an important way of designing microstructures. However, there has been limited use of commercial CAD/CAM/CAE software packages for TPMS design and manufacturing. This is mainly because TPMS is consistently described in the functional representation (F-rep) format, while modern CAD/CAM/CAE tools are built upon the boundary representation (B-rep) format. One possible solution to this gap is translating TPMS to STEP, which is the standard data exchange format of CAD/CAM/CAE. Following this direction, this paper proposes a new translation method with error-controlling and $C^2$ continuity-preserving features. It is based on an approximation error-driven TPMS sampling algorithm and a constrained-PIA algorithm. The sampling algorithm controls the deviation between the original and translated models. With it, an error bound of $2\epsilon$ on the deviation can be ensured if two conditions called $\epsilon$-density and $\epsilon$-approximation are satisfied. The constrained-PIA algorithm enforces $C^2$ continuity constraints during TPMS approximation, and meanwhile attaining high efficiency. A theoretical convergence proof of this algorithm is also given. The effectiveness of the translation method has been demonstrated by a series of examples and comparisons.
- [616] arXiv:2405.07948 [pdf, ps, other]
-
Title: Scene Action Maps: Behavioural Maps for Navigation without Metric InformationComments: ICRA 2024Subjects: Robotics (cs.RO)
Humans are remarkable in their ability to navigate without metric information. We can read abstract 2D maps, such as floor-plans or hand-drawn sketches, and use them to navigate in unseen rich 3D environments, without requiring prior traversals to map out these scenes in detail. We posit that this is enabled by the ability to represent the environment abstractly as interconnected navigational behaviours, e.g., "follow the corridor" or "turn right", while avoiding detailed, accurate spatial information at the metric level. We introduce the Scene Action Map (SAM), a behavioural topological graph, and propose a learnable map-reading method, which parses a variety of 2D maps into SAMs. Map-reading extracts salient information about navigational behaviours from the overlooked wealth of pre-existing, abstract and inaccurate maps, ranging from floor-plans to sketches. We evaluate the performance of SAMs for navigation, by building and deploying a behavioural navigation stack on a quadrupedal robot. Videos and more information is available at: this https URL.
- [617] arXiv:2405.07949 [pdf, ps, html, other]
-
Title: Online Load and Graph Balancing for Random Order InputsSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Online load balancing for heterogeneous machines aims to minimize the makespan (maximum machine workload) by scheduling arriving jobs with varying sizes on different machines. In the adversarial setting, where an adversary chooses not only the collection of job sizes but also their arrival order, the problem is well-understood and the optimal competitive ratio is known to be $\Theta(\log m)$ where $m$ is the number of machines. In the more realistic random arrival order model, the understanding is limited. Previously, the best lower bound on the competitive ratio was only $\Omega(\log \log m)$.
We significantly improve this bound by showing an $\Omega( \sqrt {\log m})$ lower bound, even for the restricted case where each job has a unit size on two machines and infinite size on the others. On the positive side, we propose an $O(\log m/\log \log m)$-competitive algorithm, demonstrating that better performance is possible in the random arrival model. - [618] arXiv:2405.07953 [pdf, ps, html, other]
-
Title: On the Decidability of Monadic Second-Order Logic with Arithmetic PredicatesComments: 17 pagesSubjects: Logic in Computer Science (cs.LO)
We investigate the decidability of the monadic second-order (MSO) theory of the structure $\langle \mathbb{N};<,P_1, \ldots,P_k \rangle$, for various unary predicates $P_1,\ldots,P_k \subseteq \mathbb{N}$. We focus in particular on "arithmetic" predicates arising in the study of linear recurrence sequences, such as fixed-base powers $\mathsf{Pow}_k = \{k^n : n \in \mathbb{N}\}$, $k$-th powers $\mathsf{N}_k = \{n^k : n \in \mathbb{N}\}$, and the set of terms of the Fibonacci sequence $\mathsf{Fib} = \{0,1,2,3,5,8,13,\ldots\}$ (and similarly for other linear recurrence sequences having a single, non-repeated, dominant characteristic root). We obtain several new unconditional and conditional decidability results, a select sample of which are the following:
$\bullet$ The MSO theory of $\langle \mathbb{N};<,\mathsf{Pow}_2, \mathsf{Fib} \rangle$ is decidable;
$\bullet$ The MSO theory of $\langle \mathbb{N};<, \mathsf{Pow}_2, \mathsf{Pow}_3, \mathsf{Pow}_6 \rangle$ is decidable;
$\bullet$ The MSO theory of $\langle \mathbb{N};<, \mathsf{Pow}_2, \mathsf{Pow}_3, \mathsf{Pow}_5 \rangle$ is decidable assuming Schanuel's conjecture;
$\bullet$ The MSO theory of $\langle \mathbb{N};<, \mathsf{Pow}_4, \mathsf{N}_2 \rangle$ is decidable;
$\bullet$ The MSO theory of $\langle \mathbb{N};<, \mathsf{Pow}_2, \mathsf{N}_2 \rangle$ is Turing-equivalent to the MSO theory of $\langle \mathbb{N};<,S \rangle$, where $S$ is the predicate corresponding to the binary expansion of $\sqrt{2}$. (As the binary expansion of $\sqrt{2}$ is widely believed to be normal, the corresponding MSO theory is in turn expected to be decidable.)
These results are obtained by exploiting and combining techniques from dynamical systems, number theory, and automata theory. - [619] arXiv:2405.07960 [pdf, ps, other]
-
Title: AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environmentsSubjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
Diagnosing and managing a patient is a complex, sequential decision making process that requires physicians to obtain information -- such as which tests to perform -- and to act upon it. Recent advances in artificial intelligence (AI) and large language models (LLMs) promise to profoundly impact clinical care. However, current evaluation schemes overrely on static medical question-answering benchmarks, falling short on interactive decision-making that is required in real-life clinical work. Here, we present AgentClinic: a multimodal benchmark to evaluate LLMs in their ability to operate as agents in simulated clinical environments. In our benchmark, the doctor agent must uncover the patient's diagnosis through dialogue and active data collection. We present two open benchmarks: a multimodal image and dialogue environment, AgentClinic-NEJM, and a dialogue-only environment, AgentClinic-MedQA. We embed cognitive and implicit biases both in patient and doctor agents to emulate realistic interactions between biased agents. We find that introducing bias leads to large reductions in diagnostic accuracy of the doctor agents, as well as reduced compliance, confidence, and follow-up consultation willingness in patient agents. Evaluating a suite of state-of-the-art LLMs, we find that several models that excel in benchmarks like MedQA are performing poorly in AgentClinic-MedQA. We find that the LLM used in the patient agent is an important factor for performance in the AgentClinic benchmark. We show that both having limited interactions as well as too many interaction reduces diagnostic accuracy in doctor agents. The code and data for this work is publicly available at this https URL.
- [620] arXiv:2405.07962 [pdf, ps, html, other]
-
Title: KG-Planner: Knowledge-Informed Graph Neural Planning for Collaborative ManipulatorsSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper presents a novel knowledge-informed graph neural planner (KG-Planner) to address the challenge of efficiently planning collision-free motions for robots in high-dimensional spaces, considering both static and dynamic environments involving humans. Unlike traditional motion planners that struggle with finding a balance between efficiency and optimality, the KG-Planner takes a different approach. Instead of relying solely on a neural network or imitating the motions of an oracle planner, our KG-Planner integrates explicit physical knowledge from the workspace. The integration of knowledge has two key aspects: (1) we present an approach to design a graph that can comprehensively model the workspace's compositional structure. The designed graph explicitly incorporates critical elements such as robot joints, obstacles, and their interconnections. This representation allows us to capture the intricate relationships between these elements. (2) We train a Graph Neural Network (GNN) that excels at generating nearly optimal robot motions. In particular, the GNN employs a layer-wise propagation rule to facilitate the exchange and update of information among workspace elements based on their connections. This propagation emphasizes the influence of these elements throughout the planning process. To validate the efficacy and efficiency of our KG-Planner, we conduct extensive experiments in both static and dynamic environments. These experiments include scenarios with and without human workers. The results of our approach are compared against existing methods, showcasing the superior performance of the KG-Planner. A short video introduction of this work is available (video link provided in the paper).
- [621] arXiv:2405.07963 [pdf, ps, other]
-
Title: PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented GenerationComments: 10 pages, 2 figures. The code is provided in github and the link to the repository is provided at the end of the publicationSubjects: Human-Computer Interaction (cs.HC)
The exponential growth of scientific literature has resulted in information overload, challenging researchers to effectively synthesize relevant publications. This paper explores the integration of traditional reference management software with advanced computational techniques, including Large Language Models and Retrieval-Augmented Generation. We introduce PyZoBot, an AI-driven platform developed in Python, incorporating Zoteros reference management with OpenAIs sophisticated LLMs. PyZoBot streamlines knowledge extraction and synthesis from extensive human-curated scientific literature databases. It demonstrates proficiency in handling complex natural language queries, integrating data from multiple sources, and meticulously presenting references to uphold research integrity and facilitate further exploration. By leveraging LLMs, RAG, and human expertise through a curated library, PyZoBot offers an effective solution to manage information overload and keep pace with rapid scientific advancements. The development of such AI-enhanced tools promises significant improvements in research efficiency and effectiveness across various disciplines.
- [622] arXiv:2405.07966 [pdf, ps, other]
-
Title: OverlapMamba: Novel Shift State Space Model for LiDAR-based Place RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Place recognition is the foundation for enabling autonomous systems to achieve independent decision-making and safe operations. It is also crucial in tasks such as loop closure detection and global localization within SLAM. Previous methods utilize mundane point cloud representations as input and deep learning-based LiDAR-based Place Recognition (LPR) approaches employing different point cloud image inputs with convolutional neural networks (CNNs) or transformer architectures. However, the recently proposed Mamba deep learning model, combined with state space models (SSMs), holds great potential for long sequence modeling. Therefore, we developed OverlapMamba, a novel network for place recognition, which represents input range views (RVs) as sequences. In a novel way, we employ a stochastic reconstruction approach to build shift state space models, compressing the visual representation. Evaluated on three different public datasets, our method effectively detects loop closures, showing robustness even when traversing previously visited locations from different directions. Relying on raw range view inputs, it outperforms typical LiDAR and multi-view combination methods in time complexity and speed, indicating strong place recognition capabilities and real-time efficiency.
- [623] arXiv:2405.07969 [pdf, ps, other]
-
Title: Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Zero-shot anomaly segmentation using pre-trained foundation models is a promising approach that enables effective algorithms without expensive, domain-specific training or fine-tuning. Ensuring that these methods work across various environmental conditions and are robust to distribution shifts is an open problem. We investigate the performance of WinCLIP [14] zero-shot anomaly segmentation algorithm by perturbing test data using three semantic transformations: bounded angular rotations, bounded saturation shifts, and hue shifts. We empirically measure a lower performance bound by aggregating across per-sample worst-case perturbations and find that average performance drops by up to 20% in area under the ROC curve and 40% in area under the per-region overlap curve. We find that performance is consistently lowered on three CLIP backbones, regardless of model architecture or learning objective, demonstrating a need for careful performance evaluation.
- [624] arXiv:2405.07973 [pdf, ps, html, other]
-
Title: A Natural Formalized Proof LanguageSubjects: Programming Languages (cs.PL)
Artificial intelligence assisted mathematical proof has become a highly focused area nowadays. One key problem in this field is to generate formal mathematical proofs from natural language proofs. Due to historical reasons, the formal proof languages adopted by traditional theorem provers were not intended to represent natural language proofs. Therefore, they are not well-suited for the aforementioned tasks and proof-checking work for educational purposes. In this paper, we design a proof language and its corresponding abstract syntax tree and implement a proof checking tool for it. This language can be easily converted from natural language, thus providing a rich corpus of formal proof. Additionally, it supports the handling of issues in informal proofs through static analysis, and enhances the expressive power of the language by introducing the structure of partial proofs. This design combines the expressiveness of natural language and the accuracy of formal language, resulting in an improved mathematical proof language.
- [625] arXiv:2405.07974 [pdf, ps, other]
-
Title: SignAvatar: Sign Language 3D Motion Reconstruction and GenerationComments: Accepted by FG2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Achieving expressive 3D motion reconstruction and automatic generation for isolated sign words can be challenging, due to the lack of real-world 3D sign-word data, the complex nuances of signing motions, and the cross-modal understanding of sign language semantics. To address these challenges, we introduce SignAvatar, a framework capable of both word-level sign language reconstruction and generation. SignAvatar employs a transformer-based conditional variational autoencoder architecture, effectively establishing relationships across different semantic modalities. Additionally, this approach incorporates a curriculum learning strategy to enhance the model's robustness and generalization, resulting in more realistic motions. Furthermore, we contribute the ASL3DWord dataset, composed of 3D joint rotation data for the body, hands, and face, for unique sign words. We demonstrate the effectiveness of SignAvatar through extensive experiments, showcasing its superior reconstruction and automatic generation capabilities. The code and dataset are available on the project page.
- [626] arXiv:2405.07975 [pdf, ps, html, other]
-
Title: Dynamic Programming for Symbolic Boolean Realizability and SynthesisComments: 33 pages including the Appendix and bibliography, 5 figures, paper is to be published in CAV 2024, but this version is inclusive of the AppendixSubjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
Inspired by recent progress in dynamic programming approaches for weighted model counting, we investigate a dynamic-programming approach in the context of boolean realizability and synthesis, which takes a conjunctive-normal-form boolean formula over input and output variables, and aims at synthesizing witness functions for the output variables in terms of the inputs. We show how graded project-join trees, obtained via tree decomposition, can be used to compute a BDD representing the realizability set for the input formulas in a bottom-up order. We then show how the intermediate BDDs generated during realizability checking phase can be applied to synthesizing the witness functions in a top-down manner. An experimental evaluation of a solver -- DPSynth -- based on these ideas demonstrates that our approach for Boolean realizabilty and synthesis has superior time and space performance over a heuristics-based approach using same symbolic representations. We discuss the advantage on scalability of the new approach, and also investigate our findings on the performance of the DP framework.
- [627] arXiv:2405.07980 [pdf, ps, other]
-
Title: Generalizing Quantum Tanner CodesSubjects: Information Theory (cs.IT)
In this work, we present a generalization of the recently proposed quantum Tanner codes by Leverrier and Zémor, which contains a construction of asymptotically good quantum LDPC codes. Quantum Tanner codes have so far been constructed equivalently from groups, Cayley graphs, or square complexes constructed from groups. We show how to enlarge this to group actions on finite sets, Schreier graphs, and a family of square complexes which is the largest possible in a certain sense. Furthermore, we discuss how the proposed generalization opens up the possibility of finding other families of asymptotically good quantum codes.
- [628] arXiv:2405.07981 [pdf, ps, other]
-
Title: Diagnosing and Predicting Autonomous Vehicle Operational Safety Using Multiple Simulation Modalities and a Virtual EnvironmentComments: Preprint. Under ReviewSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Even as technology and performance gains are made in the sphere of automated driving, safety concerns remain. Vehicle simulation has long been seen as a tool to overcome the cost associated with a massive amount of on-road testing for development and discovery of safety critical "edge-cases". However, purely software-based vehicle models may leave a large realism gap between their real-world counterparts in terms of dynamic response, and highly realistic vehicle-in-the-loop (VIL) simulations that encapsulate a virtual world around a physical vehicle may still be quite expensive to produce and similarly time intensive as on-road testing. In this work, we demonstrate an AV simulation test bed that combines the realism of vehicle-in-the-loop (VIL) simulation with the ease of implementation of model-in-the-loop (MIL) simulation. The setup demonstrated in this work allows for response diagnosis for the VIL simulations. By observing causal links between virtual weather and lighting conditions that surround the virtual depiction of our vehicle, the vision-based perception model and controller of Openpilot, and the dynamic response of our physical vehicle under test, we can draw conclusions regarding how the perceived environment contributed to vehicle response. Conversely, we also demonstrate response prediction for the MIL setup, where the need for a physical vehicle is not required to draw richer conclusions around the impact of environmental conditions on AV performance than could be obtained with VIL simulation alone. These combine for a simulation setup with accurate real-world implications for edge-case discovery that is both cost effective and time efficient to implement.
- [629] arXiv:2405.07982 [pdf, ps, other]
-
Title: Enhancing Rover Mobility Monitoring: Autoencoder-driven Anomaly Detection for CuriosityComments: To be published in IEEE Aerospace Conference 2024Subjects: Robotics (cs.RO)
Over eleven years into its mission, the Mars Science Laboratory remains vital to NASA's Mars exploration. Safeguarding the rover's long-term functionality is a top mission priority. In this study, we introduce and test undercomplete autoencoder models for detecting drive anomalies, using telemetry data from wheel actuators, the Rover Inertial Measurement Unit (RIMU), and the suspension system. Our approach enhances post-drive data analysis during tactical downlink sessions. We explore various model architectures and input features to understand their impact on performance. Evaluating the models involves testing them on unseen data to mimic real-world scenarios. Our experiments demonstrate the undercomplete autoencoder model's effectiveness in detecting drive anomalies within the Curiosity rover dataset. Remarkably, the model even identifies subtle anomalous telemetry patterns missed by human operators. Additionally, we provide insights into optimal design choices by comparing different model architectures and input features. The model's ability to capture inconspicuous anomalies, potentially indicating early-stage failures, holds promise for the field, by improving the reliability and safety of future planetary exploration missions through early anomaly detection and proactive maintenance.
- [630] arXiv:2405.07987 [pdf, ps, other]
-
Title: The Platonic Representation HypothesisComments: Equal contributionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.
- [631] arXiv:2405.07988 [pdf, ps, other]
-
Title: A Generalist Learner for Multifaceted Medical Image InterpretationComments: Technical studySubjects: Computer Vision and Pattern Recognition (cs.CV)
Current medical artificial intelligence systems are often limited to narrow applications, hindering their widespread adoption in clinical practice. To address this limitation, we propose MedVersa, a generalist learner that enables flexible learning and tasking for medical image interpretation. By leveraging a large language model as a learnable orchestrator, MedVersa can learn from both visual and linguistic supervision, support multimodal inputs, and perform real-time task specification. This versatility allows MedVersa to adapt to various clinical scenarios and perform multifaceted medical image analysis. We introduce MedInterp, the largest multimodal dataset to date for medical image interpretation, consisting of over 13 million annotated instances spanning 11 tasks across 3 modalities, to support the development of MedVersa. Our experiments demonstrate that MedVersa achieves state-of-the-art performance in 9 tasks, sometimes outperforming specialist counterparts by over 10%. MedVersa is the first to showcase the viability of multimodal generative medical AI in implementing multimodal outputs, inputs, and dynamic task specification, highlighting its potential as a multifunctional system for comprehensive medical image analysis. This generalist approach to medical image interpretation paves the way for more adaptable and efficient AI-assisted clinical decision-making.
- [632] arXiv:2405.07990 [pdf, ps, html, other]
-
Title: Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific PlotsSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
The remarkable progress of Multi-modal Large Language Models (MLLMs) has attracted significant attention due to their superior performance in visual contexts. However, their capabilities in turning visual figure to executable code, have not been evaluated thoroughly. To address this, we introduce Plot2Code, a comprehensive visual coding benchmark designed for a fair and in-depth assessment of MLLMs. We carefully collect 132 manually selected high-quality matplotlib plots across six plot types from publicly available matplotlib galleries. For each plot, we carefully offer its source code, and an descriptive instruction summarized by GPT-4. This approach enables Plot2Code to extensively evaluate MLLMs' code capabilities across various input modalities. Furthermore, we propose three automatic evaluation metrics, including code pass rate, text-match ratio, and GPT-4V overall rating, for a fine-grained assessment of the output code and rendered images. Instead of simply judging pass or fail, we employ GPT-4V to make an overall judgement between the generated and reference images, which has been shown to be consistent with human evaluation. The evaluation results, which include analyses of 14 MLLMs such as the proprietary GPT-4V, Gemini-Pro, and the open-sourced Mini-Gemini, highlight the substantial challenges presented by Plot2Code. With Plot2Code, we reveal that most existing MLLMs struggle with visual coding for text-dense plots, heavily relying on textual instruction. We hope that the evaluation results from Plot2Code on visual coding will guide the future development of MLLMs. All data involved with Plot2Code are available at this https URL.
- [633] arXiv:2405.07991 [pdf, ps, other]
-
Title: SPIN: Simultaneous Perception, Interaction and NavigationComments: In CVPR 2024. Website at this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manipulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in unstructured and dynamic environments. While the applications are broad and interesting, there are a plethora of challenges in developing these systems such as coordination between the base and arm, reliance on onboard perception for perceiving and interacting with the environment, and most importantly, simultaneously integrating all these parts together. Prior works approach the problem using disentangled modular skills for mobility and manipulation that are trivially tied together. This causes several limitations such as compounding errors, delays in decision-making, and no whole-body coordination. In this work, we present a reactive mobile manipulation framework that uses an active visual system to consciously perceive and react to its environment. Similar to how humans leverage whole-body and hand-eye coordination, we develop a mobile manipulator that exploits its ability to move and see, more specifically -- to move in order to see and to see in order to move. This allows it to not only move around and interact with its environment but also, choose "when" to perceive "what" using an active visual system. We observe that such an agent learns to navigate around complex cluttered scenarios while displaying agile whole-body coordination using only ego-vision without needing to create environment maps. Results visualizations and videos at this https URL
- [634] arXiv:2405.07992 [pdf, ps, other]
-
Title: MambaOut: Do We Really Need Mamba for Vision?Comments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conceptually conclude that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks. To empirically verify our hypotheses, we construct a series of models named \emph{MambaOut} through stacking Mamba blocks while removing their core token mixer, SSM. Experimental results strongly support our hypotheses. Specifically, our MambaOut model surpasses all visual Mamba models on ImageNet image classification, indicating that Mamba is indeed unnecessary for this task. As for detection and segmentation, MambaOut cannot match the performance of state-of-the-art visual Mamba models, demonstrating the potential of Mamba for long-sequence visual tasks. The code is available at this https URL
New submissions for Tuesday, 14 May 2024 (showing 634 of 634 entries )
- [635] arXiv:2405.06642 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: PPFlow: Target-aware Peptide Design with Torsional Flow MatchingComments: 18 pagesSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Therapeutic peptides have proven to have great pharmaceutical value and potential in recent decades. However, methods of AI-assisted peptide drug discovery are not fully explored. To fill the gap, we propose a target-aware peptide design method called \textsc{PPFlow}, based on conditional flow matching on torus manifolds, to model the internal geometries of torsion angles for the peptide structure design. Besides, we establish a protein-peptide binding dataset named PPBench2024 to fill the void of massive data for the task of structure-based peptide drug design and to allow the training of deep learning methods. Extensive experiments show that PPFlow reaches state-of-the-art performance in tasks of peptide drug generation and optimization in comparison with baseline models, and can be generalized to other tasks including docking and side-chain packing.
- [636] arXiv:2405.06645 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: On Recovering Higher-order Interactions from Protein Language ModelsSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Protein language models leverage evolutionary information to perform state-of-the-art 3D structure and zero-shot variant prediction. Yet, extracting and explaining all the mutational interactions that govern model predictions remains difficult as it requires querying the entire amino acid space for $n$ sites using $20^n$ sequences, which is computationally expensive even for moderate values of $n$ (e.g., $n\sim10$). Although approaches to lower the sample complexity exist, they often limit the interpretability of the model to just single and pairwise interactions. Recently, computationally scalable algorithms relying on the assumption of sparsity in the Fourier domain have emerged to learn interactions from experimental data. However, extracting interactions from language models poses unique challenges: it's unclear if sparsity is always present or if it is the only metric needed to assess the utility of Fourier algorithms. Herein, we develop a framework to do a systematic Fourier analysis of the protein language model ESM2 applied on three proteins-green fluorescent protein (GFP), tumor protein P53 (TP53), and G domain B1 (GB1)-across various sites for 228 experiments. We demonstrate that ESM2 is dominated by three regions in the sparsity-ruggedness plane, two of which are better suited for sparse Fourier transforms. Validations on two sample proteins demonstrate recovery of all interactions with $R^2=0.72$ in the more sparse region and $R^2=0.66$ in the more dense region, using only 7 million out of $20^{10}\sim10^{13}$ ESM2 samples, reducing the computational time by a staggering factor of 15,000. All codes and data are available on our GitHub repository this https URL.
- [637] arXiv:2405.06649 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction PredictionMingyu Jin, Haochen Xue, Zhenting Wang, Boming Kang, Ruosong Ye, Kaixiong Zhou, Mengnan Du, Yongfeng ZhangSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG); Molecular Networks (q-bio.MN)
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions, ignoring the broader context of nonphysical connections through intermediate proteins, thus limiting their effectiveness. The emergence of Large Language Models (LLMs) provides a new opportunity for addressing this complex biological challenge. By transforming structured data into natural language prompts, we can map the relationships between proteins into texts. This approach allows LLMs to identify indirect connections between proteins, tracing the path from upstream to downstream. Therefore, we propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time. Specifically, we propose Protein Chain of Thought (ProCoT), which replicates the biological mechanism of signaling pathways as natural language prompts. ProCoT considers a signaling pathway as a protein reasoning process, which starts from upstream proteins and passes through several intermediate proteins to transmit biological signals to downstream proteins. Thus, we can use ProCoT to predict the interaction between upstream proteins and downstream proteins. The training of ProLLM employs the ProCoT format, which enhances the model's understanding of complex biological problems. In addition to ProCoT, this paper also contributes to the exploration of embedding replacement of protein sites in natural language prompts, and instruction fine-tuning in protein knowledge datasets. We demonstrate the efficacy of ProLLM through rigorous validation against benchmark datasets, showing significant improvement over existing methods in terms of prediction accuracy and generalizability. The code is available at: this https URL.
- [638] arXiv:2405.06651 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: Using GANs for De Novo Protein Design Targeting Microglial IL-3R$\alpha$ to Inhibit Alzheimer's ProgressionComments: 9 pages totalSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
IL-3 is a hemopoietic growth factor that usually targets blood cell precursors; IL-3R is a cytokine receptor that binds to IL-3. However, IL-3 takes on a different role in the context of glial cells in the nervous system, where studies show that the protein IL-3 protects against Alzheimer's disease by activating microglia at their IL-3R receptors, causing the microglia to clear out the tangles caused by the build-up of misfolded Tau proteins. In this study, we seek to ascertain what role the secondary structure of IL-3 plays in its binding with the receptor. The motivation behind this study is to learn more about the mechanism and identify possible drugs that might be able to activate it, in hopes of inhibiting the spread of Alzheimer's Disease. From a preliminary analysis of complexes containing IL-3 and IL-3R, we hypothesized that the binding is largely due to the interactions of three alpha helix structures stretching towards the active site on the receptor. The original Il-3 protein serves as the control in this experiment; the other proteins being tested are generated through several types of computational de novo protein design, where machine learning allows for the production of entirely novel structures. The efficacy of the generated proteins is assessed through docking simulations with the IL-3R receptor, and the binding poses are also qualitatively examined to gain insight into the function of the binding. From the docking data and poses, the most successful proteins were those with similar secondary structure to IL-3.
- [639] arXiv:2405.06653 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: A unified cross-attention model for predicting antigen binding specificity to both HLA and TCR moleculesSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
The immune checkpoint inhibitors have demonstrated promising clinical efficacy across various tumor types, yet the percentage of patients who benefit from them remains low. The binding affinity between antigens and HLA-I/TCR molecules plays a critical role in antigen presentation and T-cell activation. Some computational methods have been developed to predict antigen-HLA or antigen-TCR binding specificity, but they focus solely on one task at a time. In this paper, we propose UnifyImmun, a unified cross-attention transformer model designed to simultaneously predicts the binding of antigens to both HLA and TCR molecules, thereby providing more comprehensive evaluation of antigen immunogenicity. We devise a two-phase progressive training strategy that enables these two tasks to mutually reinforce each other, by compelling the encoders to extract more expressive features. To further enhance the model generalizability, we incorporate virtual adversarial training. Compared to over ten existing methods for predicting antigen-HLA and antigen-TCR binding, our method demonstrates better performance in both tasks. Notably, on a large-scale COVID-19 antigen-TCR binding test set, our method improves performance by at least 9% compared to the current state-of-the-art methods. The validation experiments on three clinical cohorts confirm that our approach effectively predicts immunotherapy response and clinical outcomes. Furthermore, the cross-attention scores reveal the amino acids sites critical for antigen binding to receptors. In essence, our approach marks a significant step towards comprehensive evaluation of antigen immunogenicity.
- [640] arXiv:2405.06654 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: PROflow: An iterative refinement model for PROTAC-induced structure predictionComments: Published at the GEM workshop, ICLR 2024Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Proteolysis targeting chimeras (PROTACs) are small molecules that trigger the breakdown of traditionally ``undruggable'' proteins by binding simultaneously to their targets and degradation-associated proteins. A key challenge in their rational design is understanding their structural basis of activity. Due to the lack of crystal structures (18 in the PDB), existing PROTAC docking methods have been forced to simplify the problem into a distance-constrained protein-protein docking task. To address the data issue, we develop a novel pseudo-data generation scheme that requires only binary protein-protein complexes. This new dataset enables PROflow, an iterative refinement model for PROTAC-induced structure prediction that models the full PROTAC flexibility during constrained protein-protein docking. PROflow outperforms the state-of-the-art across docking metrics and runtime. Its inference speed enables the large-scale screening of PROTAC designs, and computed properties of predicted structures achieve statistically significant correlations with published degradation activities.
- [641] arXiv:2405.06655 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: RNA Secondary Structure Prediction Using Transformer-Based Deep Learning ModelsSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The Human Genome Project has led to an exponential increase in data related to the sequence, structure, and function of biomolecules. Bioinformatics is an interdisciplinary research field that primarily uses computational methods to analyze large amounts of biological macromolecule data. Its goal is to discover hidden biological patterns and related information. Furthermore, analysing additional relevant information can enhance the study of biological operating mechanisms. This paper discusses the fundamental concepts of RNA, RNA secondary structure, and its prediction.Subsequently, the application of machine learning technologies in predicting the structure of biological macromolecules is explored. This chapter describes the relevant knowledge of algorithms and computational complexity and presents a RNA tertiary structure prediction algorithm based on ResNet. To address the issue of the current scoring function's unsuitability for long RNA, a scoring model based on ResNet is proposed, and a structure prediction algorithm is designed. The chapter concludes by presenting some open and interesting challenges in the field of RNA tertiary structure prediction.
- [642] arXiv:2405.06657 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: Molecular Docking via Weighted Subgraph Isomorphism on Quantum AnnealersEmanuele Triuzzi, Riccardo Mengoni, Domenico Bonanni, Daniele Ottaviani, Andrea Beccari, Gianluca PalermoSubjects: Biomolecules (q-bio.BM); Computational Engineering, Finance, and Science (cs.CE); Emerging Technologies (cs.ET)
Molecular docking is an essential step in the drug discovery process involving the detection of three-dimensional poses of a ligand inside the active site of the protein. In this paper, we address the Molecular Docking search phase by formulating the problem in QUBO terms, suitable for an annealing approach. We propose a problem formulation as a weighted subgraph isomorphism between the ligand graph and the grid of the target protein pocket. In particular, we applied a graph representation to the ligand embedding all the geometrical properties of the molecule including its flexibility, and we created a weighted spatial grid to the 3D space region inside the pocket. Results and performance obtained with quantum annealers are compared with classical simulated annealing solvers.
- [643] arXiv:2405.06658 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: ProteinEngine: Empower LLM with Domain Knowledge for Protein EngineeringSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) have garnered considerable attention for their proficiency in tackling intricate tasks, particularly leveraging their capacities for zero-shot and in-context learning. However, their utility has been predominantly restricted to general tasks due to an absence of domain-specific knowledge. This constraint becomes particularly pertinent in the realm of protein engineering, where specialized expertise is required for tasks such as protein function prediction, protein evolution analysis, and protein design, with a level of specialization that existing LLMs cannot furnish. In response to this challenge, we introduce \textsc{ProteinEngine}, a human-centered platform aimed at amplifying the capabilities of LLMs in protein engineering by seamlessly integrating a comprehensive range of relevant tools, packages, and software via API calls. Uniquely, \textsc{ProteinEngine} assigns three distinct roles to LLMs, facilitating efficient task delegation, specialized task resolution, and effective communication of results. This design fosters high extensibility and promotes the smooth incorporation of new algorithms, models, and features for future development. Extensive user studies, involving participants from both the AI and protein engineering communities across academia and industry, consistently validate the superiority of \textsc{ProteinEngine} in augmenting the reliability and precision of deep learning in protein engineering tasks. Consequently, our findings highlight the potential of \textsc{ProteinEngine} to bride the disconnected tools for future research in the protein engineering domain.
- [644] arXiv:2405.06659 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: ControlMol: Adding Substruture Control To Molecule Diffusion ModelsComments: 9 pages,7 figuresSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Designing new molecules is an important task in the field of pharmaceuticals. Due to the vast design space of molecules, generating molecules conditioned on a specific sub-structure relevant to a particular function or therapeutic target is a crucial task in computer-aided drug design. In this paper, we present ControlMol, which adds sub-structure control to molecule generation with diffusion models. Unlike previous methods which view this task as inpainting or conditional generation, we adopt the idea of ControlNet into conditional molecule generation and make adaptive adjustments to a pre-trained diffusion model. We apply our method to both 2D and 3D molecule generation tasks. Conditioned on randomly partitioned sub-structure data, our method outperforms previous methods by generating more valid and diverse molecules. The method is easy to implement and can be quickly applied to a variety of pre-trained molecule generation models.
- [645] arXiv:2405.06660 (cross-list from physics.ed-ph) [pdf, ps, other]
-
Title: AI and Machine Learning for Next Generation Science AssessmentsComments: 18 pages, book chapter, in the book: Jiao, H., & Lissitz, R. W. (Eds.). Machine learning, natural language processing and psychometrics. Charlotte, NC: Information Age PublisherSubjects: Physics Education (physics.ed-ph); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
This chapter focuses on the transformative role of Artificial Intelligence (AI) and Machine Learning (ML) in science assessments. The paper begins with a discussion of the Framework for K-12 Science Education, which calls for a shift from conceptual learning to knowledge-in-use. This shift necessitates the development of new types of assessments that align with the Framework's three dimensions: science and engineering practices, disciplinary core ideas, and crosscutting concepts. The paper further highlights the limitations of traditional assessment methods like multiple-choice questions, which often fail to capture the complexities of scientific thinking and three-dimensional learning in science. It emphasizes the need for performance-based assessments that require students to engage in scientific practices like modeling, explanation, and argumentation. The paper achieves three major goals: reviewing the current state of ML-based assessments in science education, introducing a framework for scoring accuracy in ML-based automatic assessments, and discussing future directions and challenges. It delves into the evolution of ML-based automatic scoring systems, discussing various types of ML, like supervised, unsupervised, and semi-supervised learning. These systems can provide timely and objective feedback, thus alleviating the burden on teachers. The paper concludes by exploring pre-trained models like BERT and finetuned ChatGPT, which have shown promise in assessing students' written responses effectively.
- [646] arXiv:2405.06662 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: Language Interaction Network for Clinical Trial Approval EstimationSubjects: Biomolecules (q-bio.BM); Computation and Language (cs.CL); Machine Learning (cs.LG)
Clinical trial outcome prediction seeks to estimate the likelihood that a clinical trial will successfully reach its intended endpoint. This process predominantly involves the development of machine learning models that utilize a variety of data sources such as descriptions of the clinical trials, characteristics of the drug molecules, and specific disease conditions being targeted. Accurate predictions of trial outcomes are crucial for optimizing trial planning and prioritizing investments in a drug portfolio. While previous research has largely concentrated on small-molecule drugs, there is a growing need to focus on biologics-a rapidly expanding category of therapeutic agents that often lack the well-defined molecular properties associated with traditional drugs. Additionally, applying conventional methods like graph neural networks to biologics data proves challenging due to their complex nature. To address these challenges, we introduce the Language Interaction Network (LINT), a novel approach that predicts trial outcomes using only the free-text descriptions of the trials. We have rigorously tested the effectiveness of LINT across three phases of clinical trials, where it achieved ROC-AUC scores of 0.770, 0.740, and 0.748 for phases I, II, and III, respectively, specifically concerning trials involving biologic interventions.
- [647] arXiv:2405.06663 (cross-list from q-bio.BM) [pdf, ps, html, other]
-
Title: Protein Representation Learning by Capturing Protein Sequence-Structure-Function RelationshipComments: ICLR 2024 MLGenX Workshop (Spotlight)Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The goal of protein representation learning is to extract knowledge from protein databases that can be applied to various protein-related downstream tasks. Although protein sequence, structure, and function are the three key modalities for a comprehensive understanding of proteins, existing methods for protein representation learning have utilized only one or two of these modalities due to the difficulty of capturing the asymmetric interrelationships between them. To account for this asymmetry, we introduce our novel asymmetric multi-modal masked autoencoder (AMMA). AMMA adopts (1) a unified multi-modal encoder to integrate all three modalities into a unified representation space and (2) asymmetric decoders to ensure that sequence latent features reflect structural and functional information. The experiments demonstrate that the proposed AMMA is highly effective in learning protein representations that exhibit well-aligned inter-modal relationships, which in turn makes it effective for various downstream protein-related tasks.
- [648] arXiv:2405.06672 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Liouville Flow Importance SamplerComments: 25 pages, 7 figures, 15 tables. Submitted to and accepted by the 41th International Conference on Machine Learning (Vienna, Austria)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR); Data Analysis, Statistics and Probability (physics.data-an); Computation (stat.CO)
We present the Liouville Flow Importance Sampler (LFIS), an innovative flow-based model for generating samples from unnormalized density functions. LFIS learns a time-dependent velocity field that deterministically transports samples from a simple initial distribution to a complex target distribution, guided by a prescribed path of annealed distributions. The training of LFIS utilizes a unique method that enforces the structure of a derived partial differential equation to neural networks modeling velocity fields. By considering the neural velocity field as an importance sampler, sample weights can be computed through accumulating errors along the sample trajectories driven by neural velocity fields, ensuring unbiased and consistent estimation of statistical quantities. We demonstrate the effectiveness of LFIS through its application to a range of benchmark problems, on many of which LFIS achieved state-of-the-art performance.
- [649] arXiv:2405.06690 (cross-list from q-bio.BM) [pdf, ps, html, other]
-
Title: DrugLLM: Open Large Language Model for Few-shot Molecule GenerationComments: 17 pages, 3 figuresSubjects: Biomolecules (q-bio.BM); Computation and Language (cs.CL); Machine Learning (cs.LG)
Large Language Models (LLMs) have made great strides in areas such as language processing and computer vision. Despite the emergence of diverse techniques to improve few-shot learning capacity, current LLMs fall short in handling the languages in biology and chemistry. For example, they are struggling to capture the relationship between molecule structure and pharmacochemical properties. Consequently, the few-shot learning capacity of small-molecule drug modification remains impeded. In this work, we introduced DrugLLM, a LLM tailored for drug design. During the training process, we employed Group-based Molecular Representation (GMR) to represent molecules, arranging them in sequences that reflect modifications aimed at enhancing specific molecular properties. DrugLLM learns how to modify molecules in drug discovery by predicting the next molecule based on past modifications. Extensive computational experiments demonstrate that DrugLLM can generate new molecules with expected properties based on limited examples, presenting a powerful few-shot molecule generation capacity.
- [650] arXiv:2405.06693 (cross-list from q-bio.BM) [pdf, ps, html, other]
-
Title: SurfPro: Functional Protein Design Based on Continuous SurfaceSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models the geometric shape and biochemical features of a protein surface, and an autoregressive decoder to produce an amino acid sequence. We evaluate SurfPro on a standard inverse folding benchmark CATH 4.2 and two functional protein design tasks: protein binder design and enzyme design. Our SurfPro consistently surpasses previous state-of-the-art inverse folding methods, achieving a recovery rate of 57.78% on CATH 4.2 and higher success rates in terms of protein-protein binding and enzyme-substrate interaction scores.
- [651] arXiv:2405.06700 (cross-list from physics.soc-ph) [pdf, ps, other]
-
Title: LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and OpportunitiesComments: 11 pages, 0 figures, Hybrid Human Artificial Intelligence (HHAI) 2024Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
As large language models (LLMs) continue to make significant strides, their better integration into agent-based simulations offers a transformational potential for understanding complex social systems. However, such integration is not trivial and poses numerous challenges. Based on this observation, in this paper, we explore architectures and methods to systematically develop LLM-augmented social simulations and discuss potential research directions in this field. We conclude that integrating LLMs with agent-based simulations offers a powerful toolset for researchers and scientists, allowing for more nuanced, realistic, and comprehensive models of complex systems and human behaviours.
- [652] arXiv:2405.06708 (cross-list from q-bio.GN) [pdf, ps, html, other]
-
Title: LangCell: Language-Cell Pre-training for Cell Identity UnderstandingComments: 27 pages, 21 figures, conferenceSubjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, have become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce \textbf{LangCell}, the first \textbf{Lang}uage-\textbf{Cell} pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios.
- [653] arXiv:2405.06716 (cross-list from physics.comp-ph) [pdf, ps, other]
-
Title: Discrete Lehmann representation of three-point functionsSubjects: Computational Physics (physics.comp-ph); Strongly Correlated Electrons (cond-mat.str-el); Numerical Analysis (math.NA)
We present a generalization of the discrete Lehmann representation (DLR) to three-point correlation and vertex functions in imaginary time and Matsubara frequency. The representation takes the form of a linear combination of judiciously chosen exponentials in imaginary time, and products of simple poles in Matsubara frequency, which are universal for a given temperature and energy cutoff. We present a systematic algorithm to generate compact sampling grids, from which the coefficients of such an expansion can be obtained by solving a linear system. We show that the explicit form of the representation can be used to evaluate diagrammatic expressions involving infinite Matsubara sums, such as polarization functions or self-energies, with controllable, high-order accuracy. This collection of techniques establishes a framework through which methods involving three-point objects can be implemented robustly, with a substantially reduced computational cost and memory footprint.
- [654] arXiv:2405.06724 (cross-list from q-bio.MN) [pdf, ps, html, other]
-
Title: Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network modelsSubjects: Molecular Networks (q-bio.MN); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Techniques to autonomously drive research have been prominent in Computational Scientific Discovery, while Synthetic Biology is a field of science that focuses on designing and constructing new biological systems for useful purposes. Here we seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery. Comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs) are often used to evaluate cellular engineering strategies to optimise target compound production. However, predicted host behaviours are not always correctly described by GEMs, often due to errors in the models. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for microbial engineering.
- [655] arXiv:2405.06725 (cross-list from q-bio.NC) [pdf, ps, html, other]
-
Title: On the Shape of Brainscores for Large Language Models (LLMs)Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
With the rise of Large Language Models (LLMs), the novel metric "Brainscore" emerged as a means to evaluate the functional similarity between LLMs and human brain/neural systems. Our efforts were dedicated to mining the meaning of the novel score by constructing topological features derived from both human fMRI data involving 190 subjects, and 39 LLMs plus their untrained counterparts. Subsequently, we trained 36 Linear Regression Models and conducted thorough statistical analyses to discern reliable and valid features from our constructed ones. Our findings reveal distinctive feature combinations conducive to interpreting existing brainscores across various brain regions of interest (ROIs) and hemispheres, thereby significantly contributing to advancing interpretable machine learning (iML) studies. The study is enriched by our further discussions and analyses concerning existing brainscores. To our knowledge, this study represents the first attempt to comprehend the novel metric brainscore within this interdisciplinary domain.
- [656] arXiv:2405.06727 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function SpacesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that uses complex exponential activation functions. Our proof is constructive and proceeds by conducting a careful complexity analysis associated with the approximation of a Fourier features residual network by a ReLU network.
- [657] arXiv:2405.06729 (cross-list from q-bio.GN) [pdf, ps, html, other]
-
Title: Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect PredictionAleix Lafita, Ferran Gonzalez, Mahmoud Hossam, Paul Smyth, Jacob Deasy, Ari Allyn-Feuer, Daniel Seaton, Stephen YoungComments: Machine Learning for Genomics Explorations workshop at ICLR 2024Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG)
Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-odds Ratio (NLR) head. We find consistent improvements in a held-out protein test set, and on independent DMS and clinical variant annotation benchmarks from ProteinGym and ClinVar. These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction.
- [658] arXiv:2405.06732 (cross-list from q-bio.NC) [pdf, ps, html, other]
-
Title: A Global Data-Driven Model for The Hippocampus and Nucleus Accumbens of Rat From The Local Field Potential Recordings (LFP)Maedeh Sadeghi (1), Mahdi Aliyari Shoorehdeli (1), Shole jamali (2), Abbas Haghparast (2) ((1) Fault Detection and Identification (FDI) Laboratory, Faculty of Electrical Engineering, K. N. Toosi University of Technology, Tehran, Iran, (2) Neuroscience Research Center, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran)Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG)
In brain neural networks, Local Field Potential (LFP) signals represent the dynamic flow of information. Analyzing LFP clinical data plays a critical role in improving our understanding of brain mechanisms. One way to enhance our understanding of these mechanisms is to identify a global model to predict brain signals in different situations. This paper identifies a global data-driven based on LFP recordings of the Nucleus Accumbens and Hippocampus regions in freely moving rats. The LFP is recorded from each rat in two different situations: before and after the process of getting a reward which can be either a drug (Morphine) or natural food (like popcorn or biscuit). A comparison of five machine learning methods including Long Short Term Memory (LSTM), Echo State Network (ESN), Deep Echo State Network (DeepESN), Radial Basis Function (RBF), and Local Linear Model Tree (LLM) is conducted to develop this model. LoLiMoT was chosen with the best performance among all methods. This model can predict the future states of these regions with one pre-trained model. Identifying this model showed that Morphine and natural rewards do not change the dynamic features of neurons in these regions.
- [659] arXiv:2405.06774 (cross-list from q-fin.RM) [pdf, ps, other]
-
Title: Hedging American Put Options with Deep Reinforcement LearningSubjects: Risk Management (q-fin.RM); Machine Learning (cs.LG); Machine Learning (stat.ML)
This article leverages deep reinforcement learning (DRL) to hedge American put options, utilizing the deep deterministic policy gradient (DDPG) method. The agents are first trained and tested with Geometric Brownian Motion (GBM) asset paths and demonstrate superior performance over traditional strategies like the Black-Scholes (BS) Delta, particularly in the presence of transaction costs. To assess the real-world applicability of DRL hedging, a second round of experiments uses a market calibrated stochastic volatility model to train DRL agents. Specifically, 80 put options across 8 symbols are collected, stochastic volatility model coefficients are calibrated for each symbol, and a DRL agent is trained for each of the 80 options by simulating paths of the respective calibrated model. Not only do DRL agents outperform the BS Delta method when testing is conducted using the same calibrated stochastic volatility model data from training, but DRL agents achieves better results when hedging the true asset path that occurred between the option sale date and the maturity. As such, not only does this study present the first DRL agents tailored for American put option hedging, but results on both simulated and empirical market testing data also suggest the optimality of DRL agents over the BS Delta method in real-world scenarios. Finally, note that this study employs a model-agnostic Chebyshev interpolation method to provide DRL agents with option prices at each time step when a stochastic volatility model is used, thereby providing a general framework for an easy extension to more complex underlying asset processes.
- [660] arXiv:2405.06786 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: SAM3D: Zero-Shot Semi-Automatic Segmentation in 3D Medical Images with the Segment Anything ModelSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
We introduce SAM3D, a new approach to semi-automatic zero-shot segmentation of 3D images building on the existing Segment Anything Model. We achieve fast and accurate segmentations in 3D images with a four-step strategy comprising: volume slicing along non-orthogonal axes, efficient prompting in 3D, slice-wise inference using the pretrained SAM, and recoposition and refinement in 3D. We evaluated SAM3D performance qualitatively on an array of imaging modalities and anatomical structures and quantify performance for specific organs in body CT and tumors in brain MRI. By enabling users to create 3D segmentations of unseen data quickly and with dramatically reduced manual input, these methods have the potential to aid surgical planning and education, diagnostic imaging, and scientific research.
- [661] arXiv:2405.06787 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: A computational test of quantum contextuality, and even simpler proofs of quantumnessComments: 69 pages, 6 figures. For updates see this https URLSubjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
Bell non-locality is a fundamental feature of quantum mechanics whereby measurements performed on "spatially separated" quantum systems can exhibit correlations that cannot be understood as revealing predetermined values. This is a special case of the more general phenomenon of "quantum contextuality", which says that such correlations can occur even when the measurements are not necessarily on separate quantum systems, but are merely "compatible" (i.e. commuting). Crucially, while any non-local game yields an experiment that demonstrates quantum advantage by leveraging the "spatial separation" of two or more devices (and in fact several such demonstrations have been conducted successfully in recent years), the same is not true for quantum contextuality: finding the contextuality analogue of such an experiment is arguably one of the central open questions in the foundations of quantum mechanics.
In this work, we show that an arbitrary contextuality game can be compiled into an operational "test of contextuality" involving a single quantum device, by only making the assumption that the device is computationally bounded. Our work is inspired by the recent work of Kalai et al. (STOC '23) that converts any non-local game into a classical test of quantum advantage with a single device. The central idea in their work is to use cryptography to enforce spatial separation within subsystems of a single quantum device. Our work can be seen as using cryptography to enforce "temporal separation", i.e. to restrict communication between sequential measurements.
Beyond contextuality, we employ our ideas to design a "proof of quantumness" that, to the best of our knowledge, is arguably even simpler than the ones proposed in the literature so far. - [662] arXiv:2405.06789 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Self-Consistent Recursive Diffusion Bridge for Medical Image TranslationComments: 11 pages, 6 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Denoising diffusion models (DDM) have gained recent traction in medical image translation given improved training stability over adversarial models. DDMs learn a multi-step denoising transformation to progressively map random Gaussian-noise images onto target-modality images, while receiving stationary guidance from source-modality images. As this denoising transformation diverges significantly from the task-relevant source-to-target transformation, DDMs can suffer from weak source-modality guidance. Here, we propose a novel self-consistent recursive diffusion bridge (SelfRDB) for improved performance in medical image translation. Unlike DDMs, SelfRDB employs a novel forward process with start- and end-points defined based on target and source images, respectively. Intermediate image samples across the process are expressed via a normal distribution with mean taken as a convex combination of start-end points, and variance from additive noise. Unlike regular diffusion bridges that prescribe zero variance at start-end points and high variance at mid-point of the process, we propose a novel noise scheduling with monotonically increasing variance towards the end-point in order to boost generalization performance and facilitate information transfer between the two modalities. To further enhance sampling accuracy in each reverse step, we propose a novel sampling procedure where the network recursively generates a transient-estimate of the target image until convergence onto a self-consistent solution. Comprehensive analyses in multi-contrast MRI and MRI-CT translation indicate that SelfRDB offers superior performance against competing methods.
- [663] arXiv:2405.06808 (cross-list from q-fin.RM) [pdf, ps, html, other]
-
Title: Large Language Model in Financial Regulatory InterpretationSubjects: Risk Management (q-fin.RM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
This study explores the innovative use of Large Language Models (LLMs) as analytical tools for interpreting complex financial regulations. The primary objective is to design effective prompts that guide LLMs in distilling verbose and intricate regulatory texts, such as the Basel III capital requirement regulations, into a concise mathematical framework that can be subsequently translated into actionable code. This novel approach aims to streamline the implementation of regulatory mandates within the financial reporting and risk management systems of global banking institutions. A case study was conducted to assess the performance of various LLMs, demonstrating that GPT-4 outperforms other models in processing and collecting necessary information, as well as executing mathematical calculations. The case study utilized numerical simulations with asset holdings -- including fixed income, equities, currency pairs, and commodities -- to demonstrate how LLMs can effectively implement the Basel III capital adequacy requirements.
- [664] arXiv:2405.06809 (cross-list from math.CA) [pdf, ps, html, other]
-
Title: Spectral parameter power series representation for regular solutions of the radial Dirac systemSubjects: Classical Analysis and ODEs (math.CA); Mathematical Physics (math-ph); Numerical Analysis (math.NA)
A spectral parameter power series (SPPS) representation for the regular solution of the radial Dirac system with complex coefficients is obtained, as well as a SPPS representation for the (entire) characteristic function of the corresponding spectral problem on a finite interval. Based on the SPPS representation, a numerical method for solving spectral problems is developed. It is shown that the method is also applicable to solving spectral problems for perturbed Bessel equations. We exhibit that the proposed numerical method delivers excellent results. Additionally, an application of the method to find the energy values of hydrogen-like atoms with a finite radius is presented.
- [665] arXiv:2405.06836 (cross-list from q-bio.BM) [pdf, ps, html, other]
-
Title: Improving Targeted Molecule Generation through Language Model Fine-Tuning Via Reinforcement LearningSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Developing new drugs is laborious and costly, demanding extensive time investment. In this study, we introduce an innovative de-novo drug design strategy, which harnesses the capabilities of language models to devise targeted drugs for specific proteins. Employing a Reinforcement Learning (RL) framework utilizing Proximal Policy Optimization (PPO), we refine the model to acquire a policy for generating drugs tailored to protein targets. Our method integrates a composite reward function, combining considerations of drug-target interaction and molecular validity. Following RL fine-tuning, our approach demonstrates promising outcomes, yielding notable improvements in molecular validity, interaction efficacy, and critical chemical properties, achieving 65.37 for Quantitative Estimation of Drug-likeness (QED), 321.55 for Molecular Weight (MW), and 4.47 for Octanol-Water Partition Coefficient (logP), respectively. Furthermore, out of the generated drugs, only 0.041\% do not exhibit novelty.
- [666] arXiv:2405.06851 (cross-list from q-bio.NC) [pdf, ps, html, other]
-
Title: Nonlinear classification of neural manifolds with contextual informationComments: 5 pages, 5 figuresSubjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Understanding how neural systems efficiently process information through distributed representations is a fundamental challenge at the interface of neuroscience and machine learning. Recent approaches analyze the statistical and geometrical attributes of neural representations as population-level mechanistic descriptors of task implementation. In particular, manifold capacity has emerged as a promising framework linking population geometry to the separability of neural manifolds. However, this metric has been limited to linear readouts. Here, we propose a theoretical framework that overcomes this limitation by leveraging contextual input information. We derive an exact formula for the context-dependent capacity that depends on manifold geometry and context correlations, and validate it on synthetic and real data. Our framework's increased expressivity captures representation untanglement in deep networks at early stages of the layer hierarchy, previously inaccessible to analysis. As context-dependent nonlinearity is ubiquitous in neural systems, our data-driven and theoretically grounded approach promises to elucidate context-dependent computation across scales, datasets, and models.
- [667] arXiv:2405.06852 (cross-list from math.LO) [pdf, ps, html, other]
-
Title: Possibility SemanticsComments: Previous version in Selected Topics from Contemporary Logics, ed. Melvin Fitting, Volume 2 of Landscapes in Logic, College Publications, London, 2021, ISBN 97-1-84890-350-0, pp. 363-476. This version corrects Section 4.3Subjects: Logic (math.LO); Logic in Computer Science (cs.LO)
In traditional semantics for classical logic and its extensions, such as modal logic, propositions are interpreted as subsets of a set, as in discrete duality, or as clopen sets of a Stone space, as in topological duality. A point in such a set can be viewed as a "possible world," with the key property of a world being primeness--a world makes a disjunction true only if it makes one of the disjuncts true--which classically implies totality--for each proposition, a world either makes the proposition true or makes its negation true. This chapter surveys a more general approach to logical semantics, known as possibility semantics, which replaces possible worlds with possibly partial "possibilities." In classical possibility semantics, propositions are interpreted as regular open sets of a poset, as in set-theoretic forcing, or as compact regular open sets of an upper Vietoris space, as in the recent theory of "choice-free Stone duality." The elements of these sets, viewed as possibilities, may be partial in the sense of making a disjunction true without settling which disjunct is true. We explain how possibilities may be used in semantics for classical logic and modal logics and generalized to semantics for intuitionistic logics. The goals are to overcome or deepen incompleteness results for traditional semantics, to avoid the nonconstructivity of traditional semantics, and to provide richer structures for the interpretation of new languages.
- [668] arXiv:2405.06880 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image SegmentationComments: 14 pages, 5 figures, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at this https URL.
- [669] arXiv:2405.06961 (cross-list from math.LO) [pdf, ps, other]
-
Title: Dimensionality and randomnessSubjects: Logic (math.LO); Discrete Mathematics (cs.DM); Information Theory (cs.IT)
Arranging the bits of a random string or real into k columns of a two-dimensional array or higher dimensional structure is typically accompanied with loss in the Kolmogorov complexity of the columns, which depends on k. We quantify and characterize this phenomenon for arrays and trees and its relationship to negligible classes.
- [670] arXiv:2405.07013 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Energy Reduction in Cell-Free Massive MIMO through Fine-Grained Resource ManagementComments: EuCNC/6G Summit 2024Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
The physical layer foundations of cell-free massive MIMO (CF-mMIMO) have been well-established. As a next step, researchers are investigating practical and energy-efficient network implementations. This paper focuses on multiple sets of access points (APs) where user equipments (UEs) are served in each set, termed a federation, without inter-federation interference. The combination of federations and CF-mMIMO shows promise for highly-loaded scenarios. Our aim is to minimize the total energy consumption while adhering to UE downlink data rate constraints. The energy expenditure of the full system is modelled using a detailed hardware model of the APs. We jointly design the AP-UE association variables, determine active APs, and assign APs and UEs to federations. To solve this highly combinatorial problem, we develop a novel alternating optimization algorithm. Simulation results for an indoor factory demonstrate the advantages of considering multiple federations, particularly when facing large data rate requirements. Furthermore, we show that adopting a more distributed CF-mMIMO architecture is necessary to meet the data rate requirements. Conversely, if feasible, using a less distributed system with more antennas at each AP is more advantageous from an energy savings perspective.
- [671] arXiv:2405.07021 (cross-list from eess.AS) [pdf, ps, other]
-
Title: IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source LocalizationSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Extracting direct-path spatial feature is crucial for sound source localization in adverse acoustic environments. This paper proposes the IPDnet, a neural network that estimates direct-path inter-channel phase difference (DP-IPD) of sound sources from microphone array signals. The estimated DP-IPD can be easily translated to source location based on the known microphone array geometry. First, a full-band and narrow-band fusion network is proposed for DP-IPD estimation, in which alternating narrow-band and full-band layers are responsible for estimating the rough DP-IPD information in one frequency band and capturing the frequency correlations of DP-IPD, respectively. Second, a new multi-track DP-IPD learning target is proposed for the localization of flexible number of sound sources. Third, the IPDnet is extend to handling variable microphone arrays, once trained which is able to process arbitrary microphone arrays with different number of channels and array topology. Experiments of multiple-moving-speaker localization are conducted on both simulated and real-world data, which show that the proposed full-band and narrow-band fusion network and the proposed multi-track DP-IPD learning target together achieves excellent sound source localization performance. Moreover, the proposed variable-array model generalizes well to unseen microphone arrays.
- [672] arXiv:2405.07023 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Efficient Real-world Image Super-Resolution Via Adaptive Directional Gradient ConvolutionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifacts introduced by degradation cues during the imaging process in real LR increase the disorder of the overall image details, further complicating the perception of intrinsic gradient arrangement. To address these challenges, we innovatively introduce kernel-wise differential operations within the convolutional kernel and develop several learnable directional gradient convolutions. These convolutions are integrated in parallel with a novel linear weighting mechanism to form an Adaptive Directional Gradient Convolution (DGConv), which adaptively weights and fuses the basic directional gradients to improve the gradient arrangement perception capability for both regular and irregular textures. Coupled with DGConv, we further devise a novel equivalent parameter fusion method for DGConv that maintains its rich representational capabilities while keeping computational costs consistent with a single Vanilla Convolution (VConv), enabling DGConv to improve the performance of existing super-resolution networks without incurring additional computational expenses. To better leverage the superiority of DGConv, we further develop an Adaptive Information Interaction Block (AIIBlock) to adeptly balance the enhancement of texture and contrast while meticulously investigating the interdependencies, culminating in the creation of a DGPNet for Real-SR through simple stacking. Comparative results with 15 SOTA methods across three public datasets underscore the effectiveness and efficiency of our proposed approach.
- [673] arXiv:2405.07048 (cross-list from math.OC) [pdf, ps, other]
-
Title: Method of Successive Approximations for Stochastic Optimal Control: Contractivity and ConvergenceSubjects: Optimization and Control (math.OC); Dynamical Systems (math.DS); Numerical Analysis (math.NA); Probability (math.PR)
The Method of Successive Approximations (MSA) is a fixed-point iterative method used to solve stochastic optimal control problems. It is an indirect method based on the conditions derived from the Stochastic Maximum Principle (SMP), an extension of the Pontryagin Maximum Principle (PMP) to stochastic control problems. In this study, we investigate the contractivity and the convergence of MSA for a specific and interesting class of stochastic dynamical systems (when the drift coefficient is one-sided-Lipschitz with a negative constant and the diffusion coefficient is Lipschitz continuous). Our analysis unfolds in three key steps: firstly, we prove the stability of the state process with respect to the control process. Secondly, we establish the stability of the adjoint process. Finally, we present rigorous evidence to prove the contractivity and then the convergence of MSA. This study contributes to enhancing the understanding of MSA's applicability and effectiveness in addressing stochastic optimal control problems.
- [674] arXiv:2405.07068 (cross-list from math.OC) [pdf, ps, other]
-
Title: Catastrophe Insurance: An Adaptive Robust Optimization ApproachSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
The escalating frequency and severity of natural disasters, exacerbated by climate change, underscore the critical role of insurance in facilitating recovery and promoting investments in risk reduction. This work introduces a novel Adaptive Robust Optimization (ARO) framework tailored for the calculation of catastrophe insurance premiums, with a case study applied to the United States National Flood Insurance Program (NFIP). To the best of our knowledge, it is the first time an ARO approach has been applied to for disaster insurance pricing. Our methodology is designed to protect against both historical and emerging risks, the latter predicted by machine learning models, thus directly incorporating amplified risks induced by climate change. Using the US flood insurance data as a case study, optimization models demonstrate effectiveness in covering losses and produce surpluses, with a smooth balance transition through parameter fine-tuning. Among tested optimization models, results show ARO models with conservative parameter values achieving low number of insolvent states with the least insurance premium charged. Overall, optimization frameworks offer versatility and generalizability, making it adaptable to a variety of natural disaster scenarios, such as wildfires, droughts, etc. This work not only advances the field of insurance premium modeling but also serves as a vital tool for policymakers and stakeholders in building resilience to the growing risks of natural catastrophes.
- [675] arXiv:2405.07084 (cross-list from econ.TH) [pdf, ps, html, other]
-
Title: Counting steps for re-stabilization in a labor matching marketSubjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT)
We study a one-to-one labor matching market. If a worker considers resigning from her current job to obtain a better one, how long does it take for this worker to actually get it? We present an algorithm that models this situation as a re-stabilization process involving a vacancy chain. Each step of the algorithm is a link of such a chain. We show that the length of this vacancy chain, which can be interpreted as the time the worker has to wait for her new job, is intimately connected with the lattice structure of the set of stable matchings of the market. Namely, this length can be computed by considering the cardinalities of cycles in preferences derived from the initial and final stable matchings involved.
- [676] arXiv:2405.07105 (cross-list from cond-mat.mtrl-sci) [pdf, ps, html, other]
-
Title: Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuningBowen Deng, Yunyeong Choi, Peichen Zhong, Janosh Riebesell, Shashwat Anand, Zhuohan Li, KyuJung Jun, Kristin A. Persson, Gerbrand CederSubjects: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Machine learning interatomic potentials (MLIPs) have introduced a new paradigm for atomic simulations. Recent advancements have seen the emergence of universal MLIPs (uMLIPs) that are pre-trained on diverse materials datasets, providing opportunities for both ready-to-use universal force fields and robust foundations for downstream machine learning refinements. However, their performance in extrapolating to out-of-distribution complex atomic environments remains unclear. In this study, we highlight a consistent potential energy surface (PES) softening effect in three uMLIPs: M3GNet, CHGNet, and MACE-MP-0, which is characterized by energy and force under-prediction in a series of atomic-modeling benchmarks including surfaces, defects, solid-solution energetics, phonon vibration modes, ion migration barriers, and general high-energy states.
We find that the PES softening behavior originates from a systematic underprediction error of the PES curvature, which derives from the biased sampling of near-equilibrium atomic arrangements in uMLIP pre-training datasets. We demonstrate that the PES softening issue can be effectively rectified by fine-tuning with a single additional data point. Our findings suggest that a considerable fraction of uMLIP errors are highly systematic, and can therefore be efficiently corrected. This result rationalizes the data-efficient fine-tuning performance boost commonly observed with foundational MLIPs. We argue for the importance of a comprehensive materials dataset with improved PES sampling for next-generation foundational MLIPs. - [677] arXiv:2405.07110 (cross-list from q-bio.PE) [pdf, ps, html, other]
-
Title: A Vector Representation for Phylogenetic TreesSubjects: Populations and Evolution (q-bio.PE); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
Good representations for phylogenetic trees and networks are important for optimizing storage efficiency and implementation of scalable methods for the inference and analysis of evolutionary trees for genes, genomes and species. We introduce a new representation for rooted phylogenetic trees that encodes a binary tree on n taxa as a vector of length 2n in which each taxon appears exactly twice. Using this new tree representation, we introduce a novel tree rearrangement operator, called a HOP, that results in a tree space of diameter n and a quadratic neighbourhood size. We also introduce a novel metric, the HOP distance, which is the minimum number of HOPs to transform a tree into another tree. The HOP distance can be computed in near-linear time, a rare instance of a tree rearrangement distance that is tractable. Our experiments show that the HOP distance is better correlated to the Subtree-Prune-and-Regraft distance than the widely used Robinson-Foulds distance. We also describe how the novel tree representation we introduce can be further generalized to tree-child networks.
- [678] arXiv:2405.07115 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Digital Twin Aided Compressive Sensing: Enabling Site-Specific MIMO Hybrid PrecodingComments: 7 pages, 5 figuresSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Compressive sensing is a promising solution for the channel estimation in multiple-input multiple-output (MIMO) systems with large antenna arrays and constrained hardware. Utilizing site-specific channel data from real-world systems, deep learning can be employed to learn the compressive sensing measurement vectors with minimum redundancy, thereby focusing sensing power on promising spatial directions of the channel. Collecting real-world channel data, however, is challenging due to the high overhead resulting from the large number of antennas and hardware constraints. In this paper, we propose leveraging a site-specific digital twin to generate synthetic channel data, which shares a similar distribution with real-world data. The synthetic data is then used to train the deep learning models for learning measurement vectors and hybrid precoder/combiner design in an end-to-end manner. We further propose a model refinement approach to fine-tune the model pre-trained on the digital twin data with a small amount of real-world data. The evaluation results show that, by training the model on the digital twin data, the learned measurement vectors can be efficiently adapted to the environment geometry, leading to high performance of hybrid precoding for real-world deployments. Moreover, the model refinement approach can enable the digital twin aided model to achieve comparable performance to the model trained on the real-world dataset with a significantly reduced amount of real-world data.
- [679] arXiv:2405.07137 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Oracle Separation between Noisy Quantum Polynomial Time and the Polynomial HierarchySubjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC)
This work investigates the oracle separation between the physically motivated complexity class of noisy quantum circuits, inspired by definitions such as those presented by Chen, Cotler, Huang, and Li (2022). We establish that with a constant error rate, separation can be achieved in terms of NP. When the error rate is $\Omega(\log n/n)$, we can extend this result to the separation of PH. Notably, our oracles, in all separations, do not necessitate error correction schemes or fault tolerance, as all quantum circuits are of constant depth. This indicates that even quantum computers with minor errors, without error correction, may surpass classical complexity classes under various scenarios and assumptions. We also explore various common noise settings and present new classical hardness results, generalizing those found in studies by Raz and Tal (2022) and Bassirian, Bouland, Fefferman, Gunn, and Tal (2021), which are of independent interest.
- [680] arXiv:2405.07148 (cross-list from physics.flu-dyn) [pdf, ps, other]
-
Title: Investigate the efficiency of incompressible flow simulations on CPUs and GPUs with BSAMRComments: 22 pages include reference, 9 figuresSubjects: Fluid Dynamics (physics.flu-dyn); Computational Engineering, Finance, and Science (cs.CE)
Adaptive mesh refinement (AMR) is a classical technique about local refinement in space where needed, thus effectively reducing computational costs for HPC-based physics simulations. Although AMR has been used for many years, little reproducible research discusses the impact of software-based parameters on block-structured AMR (BSAMR) efficiency and how to choose them. This article primarily does parametric studies to investigate the computational efficiency of incompressible flows on a block-structured adaptive mesh. The parameters include refining block size, refining frequency, maximum level, and cycling method. A new projection skipping (PS) method is proposed, which brings insights about when and where the projections on coarser levels are safe to be omitted. We conduct extensive tests on different CPUs/GPUs for various 2D/3D incompressible flow cases, including bubble, RT instability, Taylor Green vortex, etc. Several valuable empirical conclusions are obtained to help guide simulations with BSAMR. Codes and all profiling data are available on GitHub.
- [681] arXiv:2405.07163 (cross-list from physics.ed-ph) [pdf, ps, html, other]
-
Title: Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AISubjects: Physics Education (physics.ed-ph); Artificial Intelligence (cs.AI)
Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partially because most image processing artificial intelligence models were not accessible to general educational scholars or explainable due to their complex deep neural network architecture. However, the recent development of Visual Question Answering (VQA) techniques is accomplishing usable visual language models, which receive from the user a question about the given image and returns an answer, both in natural language. Particularly, GPT-4V released by OpenAI, has wide opened the state-of-the-art visual langauge model service so that VQA could be used for a variety of purposes. However, VQA and GPT-4V have not yet been applied to educational studies much. In this position paper, we suggest that GPT-4V contributes to realizing VQA for education. By 'realizing' VQA, we denote two meanings: (1) GPT-4V realizes the utilization of VQA techniques by any educational scholars without technical/accessibility barrier, and (2) GPT-4V makes educational scholars realize the usefulness of VQA to educational research. Given these, this paper aims to introduce VQA for educational studies so that it provides a milestone for educational research methodology. In this paper, chapter II reviews the development of VQA techniques, which primes with the release of GPT-4V. Chapter III reviews the use of image analysis in educational studies. Chapter IV demonstrates how GPT-4V can be used for each research usage reviewed in Chapter III, with operating prompts provided. Finally, chapter V discusses the future implications.
- [682] arXiv:2405.07217 (cross-list from math.PR) [pdf, ps, other]
-
Title: Improved bounds for polylogarithmic graph distances in scale-free percolation and related modelsComments: 21 pagesSubjects: Probability (math.PR); Social and Information Networks (cs.SI); Combinatorics (math.CO)
In this paper, we study graph distances in the geometric random graph models scale-free percolation SFP, geometric inhomogeneous random graphs GIRG, and hyperbolic random graphs HRG. Despite the wide success of the models, the parameter regime in which graph distances are polylogarithmic is poorly understood. We provide new and improved lower bounds. In a certain portion of the parameter regime, those match the known upper bounds.
Compared to the best previous lower bounds by Hao and Heydenreich, our result has several advantages: it gives matching bounds for a larger range of parameters, thus settling the question for a larger portion of the parameter space. It strictly improves the lower bounds by Hao and Heydenreich for all parameters settings in which those bounds were not tight. It gives tail bounds on the probability of having short paths, which imply shape theorems for the $k$-neighbourhood of a vertex whenever our lower bounds are tight, and tight bounds for the size of this $k$-neighbourhood. And last but not least, our proof is much simpler and not much longer than two pages, and we demonstrate that it generalizes well by showing that the same technique also works for first passage percolation. - [683] arXiv:2405.07218 (cross-list from physics.med-ph) [pdf, ps, other]
-
Title: Chained Flexible Capsule Endoscope: Unraveling the Conundrum of Size Limitations and Functional Integration for Gastrointestinal TransitivitySubjects: Medical Physics (physics.med-ph); Systems and Control (eess.SY)
Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduces a chained flexible capsule endoscope (FCE) design concept, specifically conceived to navigate the inherent volume constraints of capsule endoscopes whilst augmenting their therapeutic functionalities. The FCE's distinctive flexibility originates from a conventional rotating joint design and the incision pattern in the flexible material. In vitro experiments validated the passive navigation ability of the FCE in rugged intestinal tracts. Further, the FCE demonstrates consistent reptile-like peristalsis under the influence of an external magnetic field, and possesses the capability for film expansion and disintegration under high-frequency electromagnetic stimulation. These findings illuminate a promising path toward amplifying the therapeutic capacities of capsule endoscopes without necessitating a size compromise.
- [684] arXiv:2405.07222 (cross-list from quant-ph) [pdf, ps, other]
-
Title: What is Quantum Parallelism, Anyhow?Comments: Accepted at ISC HPC 2024 conferenceSubjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC)
Central to the power of quantum computing is the concept of quantum parallelism: quantum systems can explore and process multiple computational paths simultaneously. In this paper, we discuss the elusive nature of quantum parallelism, drawing parallels with classical parallel computing models to elucidate its fundamental characteristics and implications for algorithmic performance. We begin by defining quantum parallelism as arising from the superposition of quantum states, allowing for the exploration of multiple computational paths in parallel. To quantify and visualize quantum parallelism, we introduce the concept of quantum dataflow diagrams, which provide a graphical representation of quantum algorithms and their parallel execution paths. We demonstrate how quantum parallelism can be measured and assessed by analyzing quantum algorithms such as the Quantum Fourier Transform (QFT) and Amplitude Amplification (AA) iterations using quantum dataflow diagrams. Furthermore, we examine the interplay between quantum parallelism and classical parallelism laws, including Amdahl's and Gustafson's laws. While these laws were originally formulated for classical parallel computing systems, we reconsider their applicability in the quantum computing domain. We argue that while classical parallelism laws offer valuable insights, their direct application to quantum computing is limited due to the unique characteristics of quantum parallelism, including the role of destructive interference and the inherent limitations of classical-quantum I/O. Our analysis highlights the need for an increased understanding of quantum parallelism and its implications for algorithm design and performance.
- [685] arXiv:2405.07226 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch TheoremSubjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The No-Free-Lunch (NFL) theorem, which quantifies problem- and data-independent generalization errors regardless of the optimization process, provides a foundational framework for comprehending diverse learning protocols' potential. Despite its significance, the establishment of the NFL theorem for quantum machine learning models remains largely unexplored, thereby overlooking broader insights into the fundamental relationship between quantum and classical learning protocols. To address this gap, we categorize a diverse array of quantum learning algorithms into three learning protocols designed for learning quantum dynamics under a specified observable and establish their NFL theorem. The exploited protocols, namely Classical Learning Protocols (CLC-LPs), Restricted Quantum Learning Protocols (ReQu-LPs), and Quantum Learning Protocols (Qu-LPs), offer varying levels of access to quantum resources. Our derived NFL theorems demonstrate quadratic reductions in sample complexity across CLC-LPs, ReQu-LPs, and Qu-LPs, contingent upon the orthogonality of quantum states and the diagonality of observables. We attribute this performance discrepancy to the unique capacity of quantum-related learning protocols to indirectly utilize information concerning the global phases of non-orthogonal quantum states, a distinctive physical feature inherent in quantum mechanics. Our findings not only deepen our understanding of quantum learning protocols' capabilities but also provide practical insights for the development of advanced quantum learning algorithms.
- [686] arXiv:2405.07242 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Fault-Tolerant Quantum LDPC EncodersSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
We propose fault-tolerant encoders for quantum low-density parity check (LDPC) codes. By grouping qubits within a quantum code over contiguous blocks and applying preshared entanglement across these blocks, we show how transversal implementation can be realized. The proposed encoder reduces the error propagation while using multi-qubit gates and is applicable for both entanglement-unassisted and entanglement-assisted quantum LDPC codes.
- [687] arXiv:2405.07245 (cross-list from q-bio.PE) [pdf, ps, other]
-
Title: Ecology, Spatial Structure, and Selection Pressure Induce Strong Signatures in Phylogenetic StructureSubjects: Populations and Evolution (q-bio.PE); Neural and Evolutionary Computing (cs.NE)
Evolutionary dynamics are shaped by a variety of fundamental, generic drivers, including spatial structure, ecology, and selection pressure. These drivers impact the trajectory of evolution, and have been hypothesized to influence phylogenetic structure. Here, we set out to assess (1) if spatial structure, ecology, and selection pressure leave detectable signatures in phylogenetic structure, (2) the extent, in particular, to which ecology can be detected and discerned in the presence of spatial structure, and (3) the extent to which these phylogenetic signatures generalize across evolutionary systems. To this end, we analyze phylogenies generated by manipulating spatial structure, ecology, and selection pressure within three computational models of varied scope and sophistication. We find that selection pressure, spatial structure, and ecology have characteristic effects on phylogenetic metrics, although these effects are complex and not always intuitive. Signatures have some consistency across systems when using equivalent taxonomic unit definitions (e.g., individual, genotype, species). Further, we find that sufficiently strong ecology can be detected in the presence of spatial structure. We also find that, while low-resolution phylogenetic reconstructions can bias some phylogenetic metrics, high-resolution reconstructions recapitulate them faithfully. Although our results suggest potential for evolutionary inference of spatial structure, ecology, and selection pressure through phylogenetic analysis, further methods development is needed to distinguish these drivers' phylometric signatures from each other and to appropriately normalize phylogenetic metrics. With such work, phylogenetic analysis could provide a versatile toolkit to study large-scale evolving populations.
- [688] arXiv:2405.07256 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Leveraging Fixed and Dynamic Pseudo-labels for Semi-supervised Medical Image SegmentationComments: Under ReviewSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Semi-supervised medical image segmentation has gained growing interest due to its ability to utilize unannotated data. The current state-of-the-art methods mostly rely on pseudo-labeling within a co-training framework. These methods depend on a single pseudo-label for training, but these labels are not as accurate as the ground truth of labeled data. Relying solely on one pseudo-label often results in suboptimal results. To this end, we propose a novel approach where multiple pseudo-labels for the same unannotated image are used to learn from the unlabeled data: the conventional fixed pseudo-label and the newly introduced dynamic pseudo-label. By incorporating multiple pseudo-labels for the same unannotated image into the co-training framework, our approach provides a more robust training approach that improves model performance and generalization capabilities. We validate our novel approach on three semi-supervised medical benchmark segmentation datasets, the Left Atrium dataset, the Pancreas-CT dataset, and the Brats-2019 dataset. Our approach significantly outperforms state-of-the-art methods over multiple medical benchmark segmentation datasets with different labeled data ratios. We also present several ablation experiments to demonstrate the effectiveness of various components used in our approach.
- [689] arXiv:2405.07295 (cross-list from q-bio.NC) [pdf, ps, other]
-
Title: Environmental enrichment: a biological model of forward transfer in continual learningComments: 23 pages, 5 figuresSubjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)
Continual learning (CL) refers to an agent's capability to learn from a continuous stream of data and transfer knowledge without forgetting old information. One crucial aspect of CL is forward transfer, i.e., improved and faster learning on a new task by leveraging information from prior knowledge. While this ability comes naturally to biological brains, it poses a significant challenge for artificial intelligence (AI). Here, we suggest that environmental enrichment (EE) can be used as a biological model for studying forward transfer, inspiring human-like AI development. EE refers to animal studies that enhance cognitive, social, motor, and sensory stimulation and is a model for what, in humans, is referred to as 'cognitive reserve'. Enriched animals show significant improvement in learning speed and performance on new tasks, typically exhibiting forward transfer. We explore anatomical, molecular, and neuronal changes post-EE and discuss how artificial neural networks (ANNs) can be used to predict neural computation changes after enriched experiences. Finally, we provide a synergistic way of combining neuroscience and AI research that paves the path toward developing AI capable of rapid and efficient new task learning.
- [690] arXiv:2405.07297 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Beyond Diagonal Reconfigurable Intelligent Surfaces in Wideband OFDM Communications: Circuit-Based Modeling and OptimizationComments: 12 pages, 6 figures, submitted to IEEE journal. arXiv admin note: text overlap with arXiv:2403.12893Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
This work investigates the modeling and optimization of beyond diagonal reconfigurable intelligent surface (BD-RIS), which generalizes conventional RIS with diagonal phase shift matrices and provides additional flexibility for manipulating wireless channels, in wideband communication systems. Specifically, we start from the signal modeling of the BD-RIS-aided orthogonal frequency division multiplexing (OFDM) system, which bridges the time-domain and frequency-domain channels, and explicitly shows the frequency dependence of the BD-RIS response. We next characterize the frequency dependence of the BD-RIS response based on circuit models. Benefiting from the admittance parameter analysis, we model individually each tunable admittance component of BD-RIS and derive an approximated linear expression with respect to the frequency of the transmit signals. With the proposed signal model for the BD-RIS-aided OFDM system and the frequency-dependent BD-RIS model, we propose algorithms to optimize the BD-RIS and the power allocation at the transmitter to maximize the average rate for a BD-RIS-aided OFDM system. Finally, simulation results show that BD-RIS outperforms conventional RIS in the OFDM system. More importantly, the impact of wideband modeling of BD-RIS on the system performance becomes more significant as the circuit complexity of BD-RIS architectures increases.
- [691] arXiv:2405.07318 (cross-list from eess.SP) [pdf, ps, other]
-
Title: AdaptNet: Rethinking Sensing and Communication for a Seamless Internet of Drones ExperienceSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
In the evolving era of Unmanned Aerial Vehicles (UAVs), the emphasis has moved from mere data collection to strategically obtaining timely and relevant data within the Internet of Drones (IoDs) ecosystem. However, the unpredictable conditions in dynamic IoDs pose safety challenges for drones. Addressing this, our approach introduces a multi-UAV framework using spatial-temporal clustering and the Frechet distance for enhancing reliability. Seamlessly coupled with Integrated Sensing and Communication (ISAC), it enhances the precision and agility of UAV networks. Our Multi-Agent Reinforcement Learning (MARL) mechanism ensures UAVs adapt strategies through ongoing environmental interactions and enhancing intelligent sensing. This focus ensures operational safety and efficiency, considering data capture and transmission viability. By evaluating the relevance of the sensed information, we can communicate only the most crucial data variations beyond a set threshold and optimize bandwidth usage. Our methodology transforms the UAV domain, transitioning drones from data gatherers to adept information orchestrators, establishing a benchmark for efficiency and adaptability in modern aerial systems.
- [692] arXiv:2405.07323 (cross-list from econ.GN) [pdf, ps, other]
-
Title: Computational analysis of US Congressional speeches reveals a shift from evidence to intuitionSegun Taofeek Aroyehun, Almog Simchon, Fabio Carrella, Jana Lasser, Stephan Lewandowsky, David GarciaSubjects: General Economics (econ.GN); Computers and Society (cs.CY)
Pursuit of honest and truthful decision-making is crucial for governance and accountability in democracies. However, people sometimes take different perspectives of what it means to be honest and how to pursue truthfulness. Here we explore a continuum of perspectives from evidence-based reasoning, rooted in ascertainable facts and data, at one end, to intuitive decisions that are driven by feelings and subjective interpretations, at the other. We analyze the linguistic traces of those contrasting perspectives in Congressional speeches from 1879 to 2022. We find that evidence-based language has continued to decline since the mid-1970s, together with a decline in legislative productivity. The decline was accompanied by increasing partisan polarization in Congress and rising income inequality in society. Results highlight the importance of evidence-based language in political decision-making.
- [693] arXiv:2405.07328 (cross-list from math.DS) [pdf, ps, other]
-
Title: An algorithm for distributed time delay identification based on a mixed Erlang kernel approximation and the linear chain trickSubjects: Dynamical Systems (math.DS); Systems and Control (eess.SY)
Time delays are ubiquitous in industry and nature, and they significantly affect both transient dynamics and stability properties. Consequently, it is often necessary to identify and account for the delays when, e.g., designing a model-based control strategy. However, identifying delays in differential equations is not straightforward and requires specialized methods. Therefore, we propose an algorithm for identifying distributed delays in delay differential equations (DDEs) that only involves simulation of ordinary differential equations (ODEs). Specifically, we 1) approximate the kernel in the DDEs (also called the memory function) by the probability density function of a mixed Erlang distribution and 2) use the linear chain trick (LCT) to transform the resulting DDEs into ODEs. Finally, the parameters in the kernel approximation are estimated as the solution to a dynamical least-squares problem, and we use a single-shooting approach to approximate this solution. We demonstrate the efficacy of the algorithm using numerical examples that involve the logistic equation and a point reactor kinetics model of a molten salt nuclear fission reactor.
- [694] arXiv:2405.07333 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Quantum Mini-Apps: A Framework for Developing and Benchmarking Quantum-HPC ApplicationsComments: 9 pages, 4 figuresSubjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC)
With the increasing maturity and scale of quantum hardware and its integration into HPC systems, there is a need to develop robust techniques for developing, characterizing, and benchmarking quantum-HPC applications and middleware systems. This requires a better understanding of interaction, coupling, and common execution patterns between quantum and classical workload tasks and components. This paper identifies six quantum-HPC execution motifs - recurring execution patterns characterized by distinct coupling and interaction modes. These motifs provide the basis for a suite of quantum mini-apps - simplified application prototypes that encapsulate essential characteristics of production systems. To support these developments, we introduce a mini-app framework that offers the necessary abstractions for creating and executing mini-apps across heterogeneous quantum-HPC infrastructure, making it a valuable tool for performance characterizations and middleware development.
- [695] arXiv:2405.07337 (cross-list from math.CO) [pdf, ps, other]
-
Title: The Rank-Ramsey Problem and the Log-Rank ConjectureSubjects: Combinatorics (math.CO); Computational Complexity (cs.CC)
A graph is called Rank-Ramsey if (i) Its clique number is small, and (ii) The adjacency matrix of its complement has small rank. We initiate a systematic study of such graphs. Our main motivation is that their constructions, as well as proofs of their non-existence, are intimately related to the famous log-rank conjecture from the field of communication complexity. These investigations also open interesting new avenues in Ramsey theory.
We construct two families of Rank-Ramsey graphs exhibiting polynomial separation between order and complement rank. Graphs in the first family have bounded clique number (as low as $41$). These are subgraphs of certain strong products, whose building blocks are derived from triangle-free strongly-regular graphs. Graphs in the second family are obtained by applying Boolean functions to Erdős-Rényi graphs. Their clique number is logarithmic, but their complement rank is far smaller than in the first family, about $\mathcal{O}(n^{2/3})$. A key component of this construction is our matrix-theoretic view of lifts.
We also consider lower bounds on the Rank-Ramsey numbers, and determine them in the range where the complement rank is $5$ or less. We consider connections between said numbers and other graph parameters, and find that the two best known explicit constructions of triangle-free Ramsey graphs turn out to be far from Rank-Ramsey. - [696] arXiv:2405.07338 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus ImagesFatema Tuj Johora Faria, Mukaffi Bin Moin, Pronay Debnath, Asif Iftekher Fahim, Faisal Muhammad ShahSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our research in fundus image analysis advances deep learning-based classification using eight pre-trained CNN models. To enhance interpretability, we utilize Explainable AI techniques such as Grad-CAM, Grad-CAM++, Score-CAM, Faster Score-CAM, and Layer CAM. These techniques illuminate the decision-making processes of the models, fostering transparency and trust in their predictions. Expanding our exploration, we investigate ten models, including TransUNet with ResNet backbones, Attention U-Net with DenseNet and ResNet backbones, and Swin-UNET. Incorporating diverse architectures such as ResNet50V2, ResNet101V2, ResNet152V2, and DenseNet121 among others, this comprehensive study deepens our insights into attention mechanisms for enhanced fundus image analysis. Among the evaluated models for fundus image classification, ResNet101 emerged with the highest accuracy, achieving an impressive 94.17%. On the other end of the spectrum, EfficientNetB0 exhibited the lowest accuracy among the models, achieving a score of 88.33%. Furthermore, in the domain of fundus image segmentation, Swin-Unet demonstrated a Mean Pixel Accuracy of 86.19%, showcasing its effectiveness in accurately delineating regions of interest within fundus images. Conversely, Attention U-Net with DenseNet201 backbone exhibited the lowest Mean Pixel Accuracy among the evaluated models, achieving a score of 75.87%.
- [697] arXiv:2405.07342 (cross-list from eess.SP) [pdf, ps, other]
-
Title: AquaIntellect: A Semantic Self-learning Framework for Underwater Internet of Things ConnectivityJournal-ref: 2023 IEEE Virtual Conference on Communications (VCC), NY, USA, 2023, pp. 299-304Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
The emerging paradigm of Non-Conventional Internet of Things (NC IoT), which is focused on the usefulness of information as opposed to the notion of high volume data collection and transmission, will be an important and dominant part of human life in the near future. This paper proposes a novel semantic-based approach for addressing the unique challenges posed by underwater NC IoT. We present an intelligent sensing strategy for exploring the semantics of the underwater environment by judiciously selecting the data to transmit, thereby minimizing redundancy for utmost relevant data transmission. We introduce an evolutionary function for the selection of the semantic-empowered messages relevant to the specific task within a minimum Age of Information (AoI), a freshness metric of the collected information, and for monitoring the underwater environment for performance optimization. A DNN-empowered Bayesian integrated with an adaptive surrogate model optimization will determine the optimal placement strategy of the sensors and the uncertainty level of the underwater landscape. An Adaptive Expected Improvement (AEI) mechanism is introduced to predict the optimal arrival rate for enabling a synchronized data sensing and transmission ecosystem, ensuring efficiency and timeliness. Simulation results show that the proposed solution outperforms conventional approaches.
- [698] arXiv:2405.07380 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Permissible four-strategy quantum extensions of classical gamesSubjects: Quantum Physics (quant-ph); Computer Science and Game Theory (cs.GT)
The study focuses on strategic-form games extended in the Eisert-Wilkens-Lewenstein scheme by two unitary operations. Conditions are determined under which the pair of unitary operators, along with classical strategies, form a game invariant under isomorphic transformations of the input classical game. These conditions are then applied to determine these operators, resulting in five main classes of games satisfying the isomorphism criterion, and a theorem is proved providing a practical criterion for this isomorphism. The interdependencies between different classes of extensions are identified, including limit cases in which one class transforms into another.
- [699] arXiv:2405.07432 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Compressed Online Learning of Conditional Mean EmbeddingComments: 39 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY)
The conditional mean embedding (CME) encodes Markovian stochastic kernels through their actions on probability distributions embedded within the reproducing kernel Hilbert spaces (RKHS). The CME plays a key role in several well-known machine learning tasks such as reinforcement learning, analysis of dynamical systems, etc. We present an algorithm to learn the CME incrementally from data via an operator-valued stochastic gradient descent. As is well-known, function learning in RKHS suffers from scalability challenges from large data. We utilize a compression mechanism to counter the scalability challenge. The core contribution of this paper is a finite-sample performance guarantee on the last iterate of the online compressed operator learning algorithm with fast-mixing Markovian samples, when the target CME may not be contained in the hypothesis space. We illustrate the efficacy of our algorithm by applying it to the analysis of an example dynamical system.
- [700] arXiv:2405.07452 (cross-list from q-bio.QM) [pdf, ps, other]
-
Title: PLA-SGCN: Protein-Ligand Binding Affinity Prediction by Integrating Similar Pairs and Semi-supervised Graph Convolutional NetworkComments: Accepted for Publication in Briefings of Bioinformatics (13-Dec-2023)Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)
The protein-ligand binding affinity (PLA) prediction goal is to predict whether or not the ligand could bind to a protein sequence. Recently, in PLA prediction, deep learning has received much attention. Two steps are involved in deep learning-based approaches: feature extraction and task prediction step. Many deep learning-based approaches concentrate on introducing new feature extraction networks or integrating auxiliary knowledge like protein-protein interaction networks or gene ontology knowledge. Then, a task prediction network is designed simply using some fully connected layers. This paper aims to integrate retrieved similar hard protein-ligand pairs in PLA prediction (i.e., task prediction step) using a semi-supervised graph convolutional network (GCN). Hard protein-ligand pairs are retrieved for each input query sample based on the manifold smoothness constraint. Then, a graph is learned automatically in which each node is a protein-ligand pair, and each edge represents the similarity between pairs. In other words, an end-to-end framework is proposed that simultaneously retrieves hard similar samples, learns protein-ligand descriptor, learns the graph topology of the input sample with retrieved similar hard samples (learn adjacency matrix), and learns a semi-supervised GCN to predict the binding affinity (as task predictor). The training step adjusts the parameter values, and in the inference step, the learned model is fine-tuned for each input sample. To evaluate the proposed approach, it is applied to the four well-known PDBbind, Davis, KIBA, and BindingDB datasets. The results show that the proposed method significantly performs better than the comparable approaches.
- [701] arXiv:2405.07482 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Marginal Fairness Sliced Wasserstein BarycenterComments: 33 pages, 14 figures, 6 tablesSubjects: Machine Learning (stat.ML); Graphics (cs.GR); Machine Learning (cs.LG)
The sliced Wasserstein barycenter (SWB) is a widely acknowledged method for efficiently generalizing the averaging operation within probability measure spaces. However, achieving marginal fairness SWB, ensuring approximately equal distances from the barycenter to marginals, remains unexplored. The uniform weighted SWB is not necessarily the optimal choice to obtain the desired marginal fairness barycenter due to the heterogeneous structure of marginals and the non-optimality of the optimization. As the first attempt to tackle the problem, we define the marginal fairness sliced Wasserstein barycenter (MFSWB) as a constrained SWB problem. Due to the computational disadvantages of the formal definition, we propose two hyperparameter-free and computationally tractable surrogate MFSWB problems that implicitly minimize the distances to marginals and encourage marginal fairness at the same time. To further improve the efficiency, we perform slicing distribution selection and obtain the third surrogate definition by introducing a new slicing distribution that focuses more on marginally unfair projecting directions. We discuss the relationship of the three proposed problems and their relationship to sliced multi-marginal Wasserstein distance. Finally, we conduct experiments on finding 3D point-clouds averaging, color harmonization, and training of sliced Wasserstein autoencoder with class-fairness representation to show the favorable performance of the proposed surrogate MFSWB problems.
- [702] arXiv:2405.07483 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: A Class of Convex Optimization-Based Recursive Algorithms for Identification of Stochastic SystemsSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
Focusing on identification, this paper develops a class of convex optimization-based criteria and correspondingly the recursive algorithms to estimate the parameter vector $\theta^{*}$ of a stochastic dynamic system. Not only do the criteria include the classical least-squares estimator but also the $L_l=|\cdot|^l, l\geq 1$, the Huber, the Log-cosh, and the Quantile costs as special cases. First, we prove that the minimizers of the convex optimization-based criteria converge to $\theta^{*}$ with probability one. Second, the recursive algorithms are proposed to find the estimates, which minimize the convex optimization-based criteria, and it is shown that these estimates also converge to the true parameter vector with probability one. Numerical examples are given, justifying the performance of the proposed algorithms including the strong consistency of the estimates, the robustness against outliers in the observations, and higher efficiency in online computation compared with the kernel-based regularization method due to the recursive nature.
- [703] arXiv:2405.07491 (cross-list from math.DS) [pdf, ps, other]
-
Title: Bifurcation Patterns and Chaos Control in Discrete-Time Coral Reef ModelSubjects: Dynamical Systems (math.DS); Numerical Analysis (math.NA); Optimization and Control (math.OC)
The reduction in coral reef densities, characterized by the proliferation of macroalgae, has emerged as a global threat. In this paper, we present a discrete-time coral reef dynamical model that incorporates macroalgae. We explore all ecologically possible equilibrium points for the proposed model. The conditions for the local stability of the interior equilibrium point are analyzed, which represents the coexistence of both coral and macroalgae. Furthermore, we investigate the model's behavior using the center manifold theorem and bifurcation theory. Our analysis reveals that the model undergoes codimension-one bifurcations, specifically period-doubling and Neimark-Sacker bifurcations. To address the chaos resulting from the emergence of the Neimark-Sacker bifurcation, we apply the OGY feedback control method and a hybrid control methodology. Finally, we provide numerical simulations not only to validate the obtained results but also to demonstrate the complex dynamic behaviors that arise. These behaviors include reversal period-doubling bifurcation, period-4, 8, and 24 bubble bifurcations, as well as chaotic behavior.
- [704] arXiv:2405.07499 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Distributed Quantum Computation with Minimum Circuit Execution Time over Quantum NetworksSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
Present quantum computers are constrained by limited qubit capacity and restricted physical connectivity, leading to challenges in large-scale quantum computations. Distributing quantum computations across a network of quantum computers is a promising way to circumvent these challenges and facilitate large quantum computations. However, distributed quantum computations require entanglements (to execute remote gates) which can incur significant generation latency and, thus, lead to decoherence of qubits. In this work, we consider the problem of distributing quantum circuits across a quantum network to minimize the execution time. The problem entails mapping the circuit qubits to network memories, including within each computer since limited connectivity within computers can affect the circuit execution time. We provide two-step solutions for the above problem: In the first step, we allocate qubits to memories to minimize the estimated execution time; for this step, we design an efficient algorithm based on an approximation algorithm for the max-quadratic-assignment problem. In the second step, we determine an efficient execution scheme, including generating required entanglements with minimum latency under the network resource and decoherence constraints; for this step, we develop two algorithms with appropriate performance guarantees under certain settings or assumptions. We consider multiple protocols for executing remote gates, viz., telegates and cat-entanglements. With extensive simulations over NetSquid, a quantum network simulator, we demonstrate the effectiveness of our developed techniques and show that they outperform a scheme based on prior work by up to 95%.
- [705] arXiv:2405.07512 (cross-list from math.CO) [pdf, ps, other]
-
Title: Separation axiom $S_3$ for geodesic convexity in graphsComments: 59 pages, 2 figuresSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Metric Geometry (math.MG)
Semispaces of a convexity space $(X,C)$ are maximal convex sets missing a point. The separation axiom $S_3$ asserts that any point $x_0\in X$ and any convex set $A$ not containing $x_0$ can be separated by complementary halfspaces (convex sets with convex complements) or, equivalently, that all semispaces are halfspaces. In this paper, we study $S_3$ for geodesic convexity in graphs and the structure of semispaces in $S_3$-graphs. We characterize $S_3$-graphs and their semispaces in terms of separation by halfspaces of vertices $x_0$ and special sets, called maximal $x_0$-proximal sets and in terms of convexity of their mutual shadows $x_0/K$ and $K/x_0$. In $S_3$-graphs $G$ satisfying the triangle condition (TC), maximal proximal sets are the pre-maximal cliques of $G$ (i.e., cliques $K$ such that $K\cup\{ x_0\}$ are maximal cliques). This allows to characterize the $S_3$-graphs satisfying (TC) in a structural way and to enumerate their semispaces efficiently. In case of meshed graphs (an important subclass of graphs satisfying (TC)), the $S_3$-graphs have been characterized by excluding five forbidden subgraphs. On the way of proving this result, we also establish some properties of meshed graphs, which maybe of independent interest. In particular, we show that any connected, locally-convex set of a meshed graph is convex. We also provide several examples of $S_3$-graphs, including the basis graphs of matroids. Finally, we consider the (NP-complete) halfspace separation problem, describe two methods of its solution, and apply them to particular classes of graphs and graph-convexities.
- [706] arXiv:2405.07552 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support RecoveryComments: Forty-first International Conference on Machine Learning (ICML 2024)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
In this paper, we focus on distributed estimation and support recovery for high-dimensional linear quantile regression. Quantile regression is a popular alternative tool to the least squares regression for robustness against outliers and data heterogeneity. However, the non-smoothness of the check loss function poses big challenges to both computation and theory in the distributed setting. To tackle these problems, we transform the original quantile regression into the least-squares optimization. By applying a double-smoothing approach, we extend a previous Newton-type distributed approach without the restrictive independent assumption between the error term and covariates. An efficient algorithm is developed, which enjoys high computation and communication efficiency. Theoretically, the proposed distributed estimator achieves a near-oracle convergence rate and high support recovery accuracy after a constant number of iterations. Extensive experiments on synthetic examples and a real data application further demonstrate the effectiveness of the proposed method.
- [707] arXiv:2405.07574 (cross-list from physics.soc-ph) [pdf, ps, other]
-
Title: Is it getting harder to make a hit? Evidence from 65 years of US music chart historyComments: 17 pages, 4 figuresSubjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)
Since the creation of the Billboard Hot 100 music chart in 1958, the chart has been a window into the music consumption of Americans. Which songs succeed on the chart is decided by consumption volumes, which can be affected by consumer music taste, and other factors such as advertisement budgets, airplay time, the specifics of ranking algorithms, and more. Since its introduction, the chart has documented music consumerism through eras of globalization, economic growth, and the emergence of new technologies for music listening. In recent years, musicians and other hitmakers have voiced their worry that the music world is changing: Many claim that it is getting harder to make a hit but until now, the claims have not been backed using chart data. Here we show that the dynamics of the Billboard Hot 100 chart have changed significantly since the chart's founding in 1958, and in particular in the past 15 years. Whereas most songs spend less time on the chart now than songs did in the past, we show that top-1 songs have tripled their chart lifetime since the 1960s, the highest-ranked songs maintain their positions for far longer than previously, and the lowest-ranked songs are replaced more frequently than ever. At the same time, who occupies the chart has also changed over the years: In recent years, fewer new artists make it into the chart and more positions are occupied by established hit makers. Finally, investigating how song chart trajectories have changed over time, we show that historical song trajectories cluster into clear trajectory archetypes characteristic of the time period they were part of. The results are interesting in the context of collective attention: Whereas recent studies have documented that other cultural products such as books, news, and movies fade in popularity quicker in recent years, music hits seem to last longer now than in the past.
- [708] arXiv:2405.07589 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Entanglement Swapping in Orbit: a Satellite Quantum Link Case StudyPaolo Fittipaldi, Kentaro Teramoto, Naphan Benchasattabuse, Michal Hajdušek, Rodney Van Meter, Frédéric GrosshansComments: 7 pages, 6 figures. Submitted to the IEEE International Conference on Quantum Computing and Engineering 2024Subjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI)
Satellite quantum communication is a promising way to build long distance quantum links, making it an essential complement to optical fiber for quantum internetworking beyond metropolitan scales. A satellite point to point optical link differs from the more common fiber links in many ways, both quantitative (higher latency, strong losses) and qualitative (nonconstant parameter values during satellite passage, intermittency of the link, impossibility to set repeaters between the satellite and the ground station). We study here the performance of a quantum link between two ground stations, using a quantum-memory-equipped satellite as a quantum repeater. In contrast with quantum key distribution satellite links, the number of available quantum memory slots m, together with the unavoidable round-trip communication latency t of at least a few milliseconds, severely reduces the effective average repetition rate to m/t -- at most a few kilohertz for foreseeable quantum memories. Our study uses two approaches, which validate each other: 1) a simple analytical model of the effective rate of the quantum link; 2) an event-based simulation using the open source Quantum Internet Simulation Package (QuISP). The important differences between satellite and fiber links led us to modify QuISP itself. This work paves the way to the study of hybrid satellite- and fiber-based quantum repeater networks interconnecting different metropolitan areas.
- [709] arXiv:2405.07599 (cross-list from physics.comp-ph) [pdf, ps, html, other]
-
Title: Transferable Neural Wavefunctions for SolidsComments: 15 pages, 3 figures, + supplementary informationSubjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG)
Deep-Learning-based Variational Monte Carlo (DL-VMC) has recently emerged as a highly accurate approach for finding approximate solutions to the many-electron Schrödinger equation. Despite its favorable scaling with the number of electrons, $\mathcal{O}(n_\text{el}^{4})$, the practical value of DL-VMC is limited by the high cost of optimizing the neural network weights for every system studied. To mitigate this problem, recent research has proposed optimizing a single neural network across multiple systems, reducing the cost per system. Here we extend this approach to solids, where similar but distinct calculations using different geometries, boundary conditions, and supercell sizes are often required. We show how to optimize a single ansatz across all of these variations, reducing the required number of optimization steps by an order of magnitude. Furthermore, we exploit the transfer capabilities of a pre-trained network. We successfully transfer a network, pre-trained on 2x2x2 supercells of LiH, to 3x3x3 supercells. This reduces the number of optimization steps required to simulate the large system by a factor of 50 compared to previous work.
- [710] arXiv:2405.07619 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descentSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Image classification based on over-parametrized convolutional neural networks with a global average-pooling layer is considered. The weights of the network are learned by gradient descent. A bound on the rate of convergence of the difference between the misclassification risk of the newly introduced convolutional neural network estimate and the minimal possible value is derived.
- [711] arXiv:2405.07622 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: De novo antibody design with SE(3) diffusionComments: 20 pages, 11 figures, 4 tables, model weights and samples available at this https URLSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
We introduce IgDiff, an antibody variable domain diffusion model based on a general protein backbone diffusion framework which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that IgDiff produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.
- [712] arXiv:2405.07624 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Towards Robust Benchmarking of Quantum Optimization AlgorithmsSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
Benchmarking the performance of quantum optimization algorithms is crucial for identifying utility for industry-relevant use cases. Benchmarking processes vary between optimization applications and depend on user-specified goals. The heuristic nature of quantum algorithms poses challenges, especially when comparing to classical counterparts. A key problem in existing benchmarking frameworks is the lack of equal effort in optimizing for the best quantum and, respectively, classical approaches. This paper presents a comprehensive set of guidelines comprising universal steps towards fair benchmarks. We discuss (1) application-specific algorithm choice, ensuring every solver is provided with the most fitting mathematical formulation of a problem; (2) the selection of benchmark data, including hard instances and real-world samples; (3) the choice of a suitable holistic figure of merit, like time-to-solution or solution quality within time constraints; and (4) equitable hyperparameter training to eliminate bias towards a particular method. The proposed guidelines are tested across three benchmarking scenarios, utilizing the Max-Cut (MC) and Travelling Salesperson Problem (TSP). The benchmarks employ classical mathematical algorithms, such as Branch-and-Cut (BNC) solvers, classical heuristics, Quantum Annealing (QA), and the Quantum Approximate Optimization Algorithm (QAOA).
- [713] arXiv:2405.07628 (cross-list from econ.TH) [pdf, ps, other]
-
Title: Substitutability, equilibrium transport, and matching modelsSubjects: Theoretical Economics (econ.TH); Computer Science and Game Theory (cs.GT)
This chapter explores the role of substitutability in economic models, particularly in the context of optimal transport and matching models. In equilibrium models with substitutability, market-clearing prices can often be recovered using coordinate update methods such as Jacobi's algorithm. We provide a detailed mathematical analysis of models with substitutability through the lens of Z- and M-functions, in particular regarding their role in ensuring the convergence of Jacobi's algorithm. The chapter proceeds by studying matching models using substitutability, first focusing on models with (imperfectly) transferable utility, and then on models with non-transferable utility. In both cases, the text reviews theoretical implications as well as computational approaches (Sinkhorn, Gale--Shapley), and highlights a practical economic application.
- [714] arXiv:2405.07635 (cross-list from math.DS) [pdf, ps, other]
-
Title: Koopman Analysis of the Singularly-Perturbed van der Pol OscillatorComments: 21 pages, 10 figuresSubjects: Dynamical Systems (math.DS); Systems and Control (eess.SY)
The Koopman operator framework holds promise for spectral analysis of nonlinear dynamical systems based on linear operators. Eigenvalues and eigenfunctions of the Koopman operator, so-called Koopman eigenvalues and Koopman eigenfunctions, respectively, mirror global properties of the system's flow. In this paper we perform the Koopman analysis of the singularly-perturbed van der Pol system. First, we show the spectral signature depending on singular perturbation: how two Koopman principle eigenvalues are ordered and what distinct shapes emerge in their associated Koopman eigenfunctions. Second, we discuss the singular limit of the Koopman operator, which is derived through the concatenation of Koopman operators for the fast and slow subsystems. From the spectral properties of the Koopman operator for the singularl-perturbed system and the singular limit, we suggest that the Koopman eigenfunctions inherit geometric properties of the singularly-perturbed system. These results are applicable to general planar singularly-perturbed systems with stable limit cycles.
- [715] arXiv:2405.07636 (cross-list from math.OC) [pdf, ps, other]
-
Title: Nonlinear Network Identifiability with Full ExcitationsComments: 12 pages, 5 figures, submitted to IEEE Transactions on Automatic ControlSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
We derive conditions for the identifiability of nonlinear networks characterized by additive dynamics at the level of the edges when all the nodes are excited. In contrast to linear systems, we show that the measurement of all sinks is necessary and sufficient for the identifiability of directed acyclic graphs, under the assumption that dynamics are described by analytic functions without constant terms (i.e., $f(0)=0$). But if constant terms are present, then the identifiability is impossible as soon as one node has more than one in-neighbor. In the case of general digraphs where cycles can exist, we consider additively separable functions for the analysis of the identifiability, and we show that the measurement of one node of all the sinks of the condensation digraph is necessary and sufficient. Several examples are added to illustrate the results.
- [716] arXiv:2405.07641 (cross-list from eess.AS) [pdf, ps, other]
-
Title: Evaluating Speech Enhancement Systems Through Listening EffortSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Understanding degraded speech is demanding, requiring increased listening effort (LE). Evaluating processed and unprocessed speech with respect to LE can objectively indicate if speech enhancement systems benefit listeners. However, existing methods for measuring LE are complex and not widely applicable. In this study, we propose a simple method to evaluate speech intelligibility and LE simultaneously without additional strain on subjects or operators. We assess this method using results from two independent studies in Norway and Denmark, testing 76 (50+26) subjects across 9 (6+3) processing conditions. Despite differences in evaluation setups, subject recruitment, and processing systems, trends are strikingly similar, demonstrating the proposed method's robustness and ease of implementation into existing practices.
- [717] arXiv:2405.07649 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Efficient Matrix Factorization Via Householder ReflectionsComments: Submitted to IEEE ITW, 2024Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Motivated by orthogonal dictionary learning problems, we propose a novel method for matrix factorization, where the data matrix $\mathbf{Y}$ is a product of a Householder matrix $\mathbf{H}$ and a binary matrix $\mathbf{X}$. First, we show that the exact recovery of the factors $\mathbf{H}$ and $\mathbf{X}$ from $\mathbf{Y}$ is guaranteed with $\Omega(1)$ columns in $\mathbf{Y}$ . Next, we show approximate recovery (in the $l\infty$ sense) can be done in polynomial time($O(np)$) with $\Omega(\log n)$ columns in $\mathbf{Y}$ . We hope the techniques in this work help in developing alternate algorithms for orthogonal dictionary learning.
- [718] arXiv:2405.07657 (cross-list from physics.bio-ph) [pdf, ps, other]
-
Title: Beyond traditional Magnetic Resonance processing with Artificial IntelligenceAmir Jahangiri, Vladislav Orekhov (Department of Chemistry and Molecular Biology, Swedish NMR Centre, University of Gothenburg, Sweden)Subjects: Biological Physics (physics.bio-ph); Machine Learning (cs.LG); Signal Processing (eess.SP)
Smart signal processing approaches using Artificial Intelligence are gaining momentum in NMR applications. In this study, we demonstrate that AI offers new opportunities beyond tasks addressed by traditional techniques. We developed and trained several artificial neural networks in our new toolbox Magnetic Resonance with Artificial intelligence (MR-Ai) to solve three "impossible" problems: quadrature detection using only Echo (or Anti-Echo) modulation from the traditional Echo/Anti-Echo scheme; accessing uncertainty of signal intensity at each point in a spectrum processed by any given method; and defining a reference-free score for quantitative access of NMR spectrum quality. Our findings highlight the potential of AI techniques to revolutionize NMR processing and analysis.
- [719] arXiv:2405.07674 (cross-list from eess.IV) [pdf, ps, other]
-
Title: CoVScreen: Pitfalls and recommendations for screening COVID-19 using Chest X-raysComments: 21 pagesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The novel coronavirus (COVID-19), a highly infectious respiratory disease caused by the SARS-CoV-2 has emerged as an unprecedented healthcare crisis. The pandemic had a devastating impact on the health, well-being, and economy of the global population. Early screening and diagnosis of symptomatic patients plays crucial role in isolation of patient to help stop community transmission as well as providing early treatment helping in reducing the mortality rate. Although, the RT-PCR test is the gold standard for COVID-19 testing, it is a manual, laborious, time consuming, uncomfortable, and invasive process. Due to its accessibility, availability, lower-cost, ease of sanitisation, and portable setup, chest X-Ray imaging can serve as an effective screening and diagnostic tool. In this study, we first highlight limitations of existing datasets and studies in terms of data quality, data imbalance, and evaluation strategy. Second, we curated a large-scale COVID-19 chest X-ray dataset from many publicly available COVID-19 imaging databases and proposed a pre-processing pipeline to improve quality of the dataset. We proposed CoVScreen, an CNN architecture to train and test the curated dataset. The experimental results applying different classification scenarios on the curated dataset in terms of various evaluation metrics demonstrate the effectiveness of proposed methodology in the screening of COVID-19 infection.
- [720] arXiv:2405.07724 (cross-list from math.CT) [pdf, ps, other]
-
Title: Monoidal closure of Grothendieck constructions via $\Sigma$-tractible monoidal structures and Dialectica formulasSubjects: Category Theory (math.CT); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
We study the categorical structure of the Grothendieck construction of an indexed category $\mathcal{L}:\mathcal{C}^{op}\to\mathbf{CAT}$ and characterise fibred limits, colimits, and monoidal structures. Next, we give sufficient conditions for the monoidal closure of the total category $\Sigma_\mathcal{C} \mathcal{L}$ of a Grothendieck construction of an indexed category $\mathcal{L}:\mathcal{C}^{op}\to\mathbf{CAT}$. Our analysis is a generalization of Gödel's Dialectica interpretation, and it relies on a novel notion of $\Sigma$-tractible monoidal structure. As we will see, $\Sigma$-tractible coproducts simultaneously generalize cocartesian coclosed structures, biproducts and extensive coproducts. We analyse when the closed structure is fibred -- usually it is not.
- [721] arXiv:2405.07735 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Federated Hierarchical Tensor Networks: a Collaborative Learning Quantum AI-Driven Framework for HealthcareComments: 12 pages, 8 figuresSubjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Healthcare industries frequently handle sensitive and proprietary data, and due to strict privacy regulations, they are often reluctant to share data directly. In today's context, Federated Learning (FL) stands out as a crucial remedy, facilitating the rapid advancement of distributed machine learning while effectively managing critical concerns regarding data privacy and governance. The fusion of federated learning and quantum computing represents a groundbreaking interdisciplinary approach with immense potential to revolutionize various industries, from healthcare to finance. In this work, we proposed a federated learning framework based on quantum tensor networks, which leverages the principles of many-body quantum physics. Currently, there are no known classical tensor networks implemented in federated settings. Furthermore, we investigated the effectiveness and feasibility of the proposed framework by conducting a differential privacy analysis to ensure the security of sensitive data across healthcare institutions. Experiments on popular medical image datasets show that the federated quantum tensor network model achieved a mean receiver-operator characteristic area under the curve (ROC-AUC) between 0.91-0.98. Experimental results demonstrate that the quantum federated global model, consisting of highly entangled tensor network structures, showed better generalization and robustness and achieved higher testing accuracy, surpassing the performance of locally trained clients under unbalanced data distributions among healthcare institutions.
- [722] arXiv:2405.07762 (cross-list from eess.IV) [pdf, ps, other]
-
Title: A method for supervoxel-wise association studies of age and other non-imaging variables from coronary computed tomography angiogramsComments: 34 pagesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The study of associations between an individual's age and imaging and non-imaging data is an active research area that attempts to aid understanding of the effects and patterns of aging. In this work we have conducted a supervoxel-wise association study between both volumetric and tissue density features in coronary computed tomography angiograms and the chronological age of a subject, to understand the localized changes in morphology and tissue density with age. To enable a supervoxel-wise study of volume and tissue density, we developed a novel method based on image segmentation, inter-subject image registration, and robust supervoxel-based correlation analysis, to achieve a statistical association study between the images and age. We evaluate the registration methodology in terms of the Dice coefficient for the heart chambers and myocardium, and the inverse consistency of the transformations, showing that the method works well in most cases with high overlap and inverse consistency. In a sex-stratified study conducted on a subset of $n=1388$ images from the SCAPIS study, the supervoxel-wise analysis was able to find localized associations with age outside of the commonly segmented and analyzed sub-regions, and several substantial differences between the sexes in association of age and volume.
- [723] arXiv:2405.07770 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Hype or Heuristic? Quantum Reinforcement Learning for Join Order OptimisationSubjects: Quantum Physics (quant-ph); Databases (cs.DB); Machine Learning (cs.LG)
Identifying optimal join orders (JOs) stands out as a key challenge in database research and engineering. Owing to the large search space, established classical methods rely on approximations and heuristics. Recent efforts have successfully explored reinforcement learning (RL) for JO. Likewise, quantum versions of RL have received considerable scientific attention. Yet, it is an open question if they can achieve sustainable, overall practical advantages with improved quantum processors.
In this paper, we present a novel approach that uses quantum reinforcement learning (QRL) for JO based on a hybrid variational quantum ansatz. It is able to handle general bushy join trees instead of resorting to simpler left-deep variants as compared to approaches based on quantum(-inspired) optimisation, yet requires multiple orders of magnitudes fewer qubits, which is a scarce resource even for post-NISQ systems.
Despite moderate circuit depth, the ansatz exceeds current NISQ capabilities, which requires an evaluation by numerical simulations. While QRL may not significantly outperform classical approaches in solving the JO problem with respect to result quality (albeit we see parity), we find a drastic reduction in required trainable parameters. This benefits practically relevant aspects ranging from shorter training times compared to classical RL, less involved classical optimisation passes, or better use of available training data, and fits data-stream and low-latency processing scenarios. Our comprehensive evaluation and careful discussion delivers a balanced perspective on possible practical quantum advantage, provides insights for future systemic approaches, and allows for quantitatively assessing trade-offs of quantum approaches for one of the most crucial problems of database management systems. - [724] arXiv:2405.07790 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Hamiltonian-based Quantum Reinforcement Learning for Neural Combinatorial OptimizationSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
Advancements in Quantum Computing (QC) and Neural Combinatorial Optimization (NCO) represent promising steps in tackling complex computational challenges. On the one hand, Variational Quantum Algorithms such as QAOA can be used to solve a wide range of combinatorial optimization problems. On the other hand, the same class of problems can be solved by NCO, a method that has shown promising results, particularly since the introduction of Graph Neural Networks. Given recent advances in both research areas, we introduce Hamiltonian-based Quantum Reinforcement Learning (QRL), an approach at the intersection of QC and NCO. We model our ansatzes directly on the combinatorial optimization problem's Hamiltonian formulation, which allows us to apply our approach to a broad class of problems. Our ansatzes show favourable trainability properties when compared to the hardware efficient ansatzes, while also not being limited to graph-based problems, unlike previous works. In this work, we evaluate the performance of Hamiltonian-based QRL on a diverse set of combinatorial optimization problems to demonstrate the broad applicability of our approach and compare it to QAOA.
- [725] arXiv:2405.07795 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Improved Bound for Robust Causal Bandits with Linear ModelsComments: 11 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2310.19794Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This paper investigates the robustness of causal bandits (CBs) in the face of temporal model fluctuations. This setting deviates from the existing literature's widely-adopted assumption of constant causal models. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown and subject to variations over time. The goal is to design a sequence of interventions that incur the smallest cumulative regret compared to an oracle aware of the entire causal model and its fluctuations. A robust CB algorithm is proposed, and its cumulative regret is analyzed by establishing both upper and lower bounds on the regret. It is shown that in a graph with maximum in-degree $d$, length of the largest causal path $L$, and an aggregate model deviation $C$, the regret is upper bounded by $\tilde{\mathcal{O}}(d^{L-\frac{1}{2}}(\sqrt{T} + C))$ and lower bounded by $\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T}\; ,\; d^2C\})$. The proposed algorithm achieves nearly optimal $\tilde{\mathcal{O}}(\sqrt{T})$ regret when $C$ is $o(\sqrt{T})$, maintaining sub-linear regret for a broad range of $C$.
- [726] arXiv:2405.07842 (cross-list from astro-ph.IM) [pdf, ps, other]
-
Title: Ground-based Image Deconvolution with Swin Transformer UNetComments: 11 pages, 14 figuresSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV)
As ground-based all-sky astronomical surveys will gather millions of images in the coming years, a critical requirement emerges for the development of fast deconvolution algorithms capable of efficiently improving the spatial resolution of these images. By successfully recovering clean and high-resolution images from these surveys, our objective is to help deepen our understanding of galaxy formation and evolution through accurate photometric measurements. We introduce a two-step deconvolution framework using a Swin Transformer architecture. Our study reveals that the deep learning-based solution introduces a bias, constraining the scope of scientific analysis. To address this limitation, we propose a novel third step relying on the active coefficients in the sparsity wavelet framework. By conducting a performance comparison between our deep learning-based method and Firedec, a classical deconvolution algorithm, we analyze a subset of the EDisCS cluster samples. We demonstrate the advantage of our method in terms of resolution recovery, generalization to different noise properties, and computational efficiency. Not only does the analysis of this cluster sample assess the efficiency of our method, but it also enables us to quantify the number of clumps within these galaxies in relation to their disc colour. This robust technique holds promise for identifying structures in the distant universe from ground-based images.
- [727] arXiv:2405.07843 (cross-list from math.CO) [pdf, ps, other]
-
Title: An almost complete $t$-intersection theorem for permutationsSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
For any $\epsilon>0$ and $n>(1+\epsilon)t$, $n>n_0(\epsilon)$ we determine the size of the largest $t$-intersecting family of permutations, as well as give a sharp stability result. This resolves a conjecture of Ellis, Friedgut and Pilpel (2011) and shows the validity of conjectures of Frankl and Deza (1977) and Cameron (1988) for $n>(1+\epsilon )t$. We note that, for this range of parameters, the extremal examples are not necessarily trivial, and that our statement is analogous to the celebrated Ahlswede-Khachatrian theorem. The proof is based on the refinement of the method of spread approximations, recently introduced by Kupavskii and Zakharov (2022).
- [728] arXiv:2405.07854 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Using Multiparametric MRI with Optimized Synthetic Correlated Diffusion Imaging to Enhance Breast Cancer Pathologic Complete Response PredictionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In 2020, 685,000 deaths across the world were attributed to breast cancer, underscoring the critical need for innovative and effective breast cancer treatment. Neoadjuvant chemotherapy has recently gained popularity as a promising treatment strategy for breast cancer, attributed to its efficacy in shrinking large tumors and leading to pathologic complete response. However, the current process to recommend neoadjuvant chemotherapy relies on the subjective evaluation of medical experts which contain inherent biases and significant uncertainty. A recent study, utilizing volumetric deep radiomic features extracted from synthetic correlated diffusion imaging (CDI$^s$), demonstrated significant potential in noninvasive breast cancer pathologic complete response prediction. Inspired by the positive outcomes of optimizing CDI$^s$ for prostate cancer delineation, this research investigates the application of optimized CDI$^s$ to enhance breast cancer pathologic complete response prediction. Using multiparametric MRI that fuses optimized CDI$^s$ with diffusion-weighted imaging (DWI), we obtain a leave-one-out cross-validation accuracy of 93.28%, over 5.5% higher than that previously reported.
- [729] arXiv:2405.07861 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion ImagingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Breast cancer was diagnosed for over 7.8 million women between 2015 to 2020. Grading plays a vital role in breast cancer treatment planning. However, the current tumor grading method involves extracting tissue from patients, leading to stress, discomfort, and high medical costs. A recent paper leveraging volumetric deep radiomic features from synthetic correlated diffusion imaging (CDI$^s$) for breast cancer grade prediction showed immense promise for noninvasive methods for grading. Motivated by the impact of CDI$^s$ optimization for prostate cancer delineation, this paper examines using optimized CDI$^s$ to improve breast cancer grade prediction. We fuse the optimized CDI$^s$ signal with diffusion-weighted imaging (DWI) to create a multiparametric MRI for each patient. Using a larger patient cohort and training across all the layers of a pretrained MONAI model, we achieve a leave-one-out cross-validation accuracy of 95.79%, over 8% higher compared to that previously reported.
- [730] arXiv:2405.07869 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Enhancing Clinically Significant Prostate Cancer Prediction in T2-weighted Images through Transfer Learning from Breast CancerSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In 2020, prostate cancer saw a staggering 1.4 million new cases, resulting in over 375,000 deaths. The accurate identification of clinically significant prostate cancer is crucial for delivering effective treatment to patients. Consequently, there has been a surge in research exploring the application of deep neural networks to predict clinical significance based on magnetic resonance images. However, these networks demand extensive datasets to attain optimal performance. Recently, transfer learning emerged as a technique that leverages acquired features from a domain with richer data to enhance the performance of a domain with limited data. In this paper, we investigate the improvement of clinically significant prostate cancer prediction in T2-weighted images through transfer learning from breast cancer. The results demonstrate a remarkable improvement of over 30% in leave-one-out cross-validation accuracy.
- [731] arXiv:2405.07879 (cross-list from stat.AP) [pdf, ps, other]
-
Title: On the Relation Between Autoencoders and Non-negative Matrix Factorization, and Their Application for Mutational Signature ExtractionSubjects: Applications (stat.AP); Machine Learning (cs.LG)
The aim of this study is to provide a foundation to understand the relationship between non-negative matrix factorization (NMF) and non-negative autoencoders enabling proper interpretation and understanding of autoencoder-based alternatives to NMF. Since its introduction, NMF has been a popular tool for extracting interpretable, low-dimensional representations of high-dimensional data. However, recently, several studies have proposed to replace NMF with autoencoders. This increasing popularity of autoencoders warrants an investigation on whether this replacement is in general valid and reasonable. Moreover, the exact relationship between non-negative autoencoders and NMF has not been thoroughly explored. Thus, a main aim of this study is to investigate in detail the relationship between non-negative autoencoders and NMF. We find that the connection between the two models can be established through convex NMF, which is a restricted case of NMF. In particular, convex NMF is a special case of an autoencoder. The performance of NMF and autoencoders is compared within the context of extraction of mutational signatures from cancer genomics data. We find that the reconstructions based on NMF are more accurate compared to autoencoders, while the signatures extracted using both methods show comparable consistencies and values when externally validated. These findings suggest that the non-negative autoencoders investigated in this article do not provide an improvement of NMF in the field of mutational signature extraction.
- [732] arXiv:2405.07888 (cross-list from math-ph) [pdf, ps, other]
-
Title: The fermionic massless modular HamiltonianComments: 22 pages, no figuresSubjects: Mathematical Physics (math-ph); Information Theory (cs.IT); Operator Algebras (math.OA)
We provide an explicit expression for the modular hamiltonian of the von Neumann algebras associated to the unit double cone for the (fermionic) quantum field theories of the 2-component Weyl (helicity 1/2) field, and of the 4-component massless Dirac and Majorana fields. To this end, we represent the one particle spaces of these theories in terms of solutions of the corresponding wave equations, and obtain the action of the modular group on them. As an application, we compute the relative entropy between the vacuum of the massless Majorana field and one particle states associated to waves with Cauchy data localized in the spatial unit ball.
- [733] arXiv:2405.07890 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Subspace-Informed Matrix CompletionComments: arXiv admin note: text overlap with arXiv:2111.00235Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior information is oftentimes the low-rank nature of the ground-truth matrix, in various real-world scenarios, additional knowledge about the ground-truth low-rank matrix is available. For instance, in collaborative filtering, Netflix problem, and dynamic channel estimation in wireless communications, we have partial or full knowledge about the signal subspace in advance. Specifically, we are aware of some subspaces that form multiple angles with the column and row spaces of the ground-truth matrix. Leveraging this valuable information has the potential to significantly reduce the required number of observations. To this end, we introduce a multi-weight nuclear norm optimization problem that concurrently promotes the low-rank property as well the information about the available subspaces. The proposed weights are tailored to penalize each angle corresponding to each basis of the prior subspace independently. We further propose an optimal weight selection strategy by minimizing the coherence parameter of the ground-truth matrix, which is equivalent to minimizing the required number of observations. Simulation results validate the advantages of incorporating multiple weights in the completion procedure. Specifically, our proposed multi-weight optimization problem demonstrates a substantial reduction in the required number of observations compared to the state-of-the-art methods.
- [734] arXiv:2405.07895 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Optimal Transmitter Design and Pilot Spacing in MIMO Non-Stationary Aging ChannelsSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
This work considers an uplink wireless communication system where multiple users with multiple antennas transmit data frames over dynamic channels. Previous studies have shown that multiple transmit and receive antennas can substantially enhance the sum-capacity of all users when the channel is known at the transmitter and in the case of uncorrelated transmit and receive antennas. However, spatial correlations stemming from close proximity of transmit antennas and channel variation between pilot and data time slots, known as channel aging, can substantially degrade the transmission rate if they are not properly into account. In this work, we provide an analytical framework to concurrently exploit both of these features. Specifically, we first propose a beamforming framework to capture spatial correlations. Then, based on random matrix theory tools, we introduce a deterministic expression that approximates the average sum-capacity of all users. Subsequently, we obtain the optimal values of pilot spacing and beamforming vectors upon maximizing this expression. Simulation results show the impacts of path loss, velocity of mobile users and Rician factor on the resulting sum-capacity and underscore the efficacy of our methodology compared to prior works.
- [735] arXiv:2405.07898 (cross-list from physics.comp-ph) [pdf, ps, other]
-
Title: Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale SystemKylee Santos, Stan Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan Thompson, Delyan Z Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A Leon, James H Laros III, Michael James, Sivasankaran RajamanickamComments: 10 pages, 10 figures, 5 tablesSubjects: Computational Physics (physics.comp-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)
Molecular dynamics (MD) simulations have transformed our understanding of the nanoscale, driving breakthroughs in materials science, computational chemistry, and several other fields, including biophysics and drug design. Even on exascale supercomputers, however, runtimes are excessive for systems and timescales of scientific interest. Here, we demonstrate strong scaling of MD simulations on the Cerebras Wafer-Scale Engine. By dedicating a processor core for each simulated atom, we demonstrate a 179-fold improvement in timesteps per second versus the Frontier GPU-based Exascale platform, along with a large improvement in timesteps per unit energy. Reducing every year of runtime to two days unlocks currently inaccessible timescales of slow microstructure transformation processes that are critical for understanding material behavior and function. Our dataflow algorithm runs Embedded Atom Method (EAM) simulations at rates over 270,000 timesteps per second for problems with up to 800k atoms. This demonstrated performance is unprecedented for general-purpose processing cores.
- [736] arXiv:2405.07905 (cross-list from eess.IV) [pdf, ps, other]
-
Title: PLUTO: Pathology-Universal TransformerDinkar Juyal, Harshith Padigela, Chintan Shah, Daniel Shenker, Natalia Harguindeguy, Yi Liu, Blake Martin, Yibo Zhang, Michael Nercessian, Miles Markey, Isaac Finberg, Kelsey Luu, Daniel Borders, Syed Ashar Javed, Emma Krause, Raymond Biju, Aashish Sood, Allen Ma, Jackson Nyman, John Shamshoian, Guillaume Chhor, Darpan Sanghavi, Marc Thibault, Limin Yu, Fedaa Najdawi, Jennifer A. Hipp, Darren Fahy, Benjamin Glass, Eric Walk, John Abel, Harsha Pokkalla, Andrew H. Beck, Sean GrullonSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Pathology is the study of microscopic inspection of tissue, and a pathology diagnosis is often the medical gold standard to diagnose disease. Pathology images provide a unique challenge for computer-vision-based analysis: a single pathology Whole Slide Image (WSI) is gigapixel-sized and often contains hundreds of thousands to millions of objects of interest across multiple resolutions. In this work, we propose PathoLogy Universal TransfOrmer (PLUTO): a light-weight pathology FM that is pre-trained on a diverse dataset of 195 million image tiles collected from multiple sites and extracts meaningful representations across multiple WSI scales that enable a large variety of downstream pathology tasks. In particular, we design task-specific adaptation heads that utilize PLUTO's output embeddings for tasks which span pathology scales ranging from subcellular to slide-scale, including instance segmentation, tile classification, and slide-level prediction. We compare PLUTO's performance to other state-of-the-art methods on a diverse set of external and internal benchmarks covering multiple biologically relevant tasks, tissue types, resolutions, stains, and scanners. We find that PLUTO matches or outperforms existing task-specific baselines and pathology-specific foundation models, some of which use orders-of-magnitude larger datasets and model sizes when compared to PLUTO. Our findings present a path towards a universal embedding to power pathology image analysis, and motivate further exploration around pathology foundation models in terms of data diversity, architectural improvements, sample efficiency, and practical deployability in real-world applications.
- [737] arXiv:2405.07961 (cross-list from math.PR) [pdf, ps, other]
-
Title: Cloaking for random walks using a discrete potential theoryComments: 24 pages, 8 figuresSubjects: Probability (math.PR); Numerical Analysis (math.NA)
The diffusion of charged particles in a graph can be modeled using random walks on a weighted graph. We give strategies to hide (or cloak) changes in a subgraph from the perspective of measurements of expected net particle charges made at nodes away from the cloaked subgraph. We distinguish between passive and active strategies, depending on whether the strategy involves injecting particles. The passive strategy can hide topology and edge weight changes. In addition to these capabilities, the active strategy can also hide sources of particles, at the cost of prior knowledge of the expected net particle charges in the reference graph. The strategies we present rely on discrete analogues of classic potential theory, that include a Calderón calculus on graphs.
- [738] arXiv:2405.07965 (cross-list from math.OC) [pdf, ps, other]
-
Title: Fast Computation of Superquantile-Constrained Optimization Through Implicit Scenario ReductionComments: 34 pages, 2 figuresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
Superquantiles have recently gained significant interest as a risk-aware metric for addressing fairness and distribution shifts in statistical learning and decision making problems. This paper introduces a fast, scalable and robust second-order computational framework to solve large-scale optimization problems with superquantile-based constraints. Unlike empirical risk minimization, superquantile-based optimization requires ranking random functions evaluated across all scenarios to compute the tail conditional expectation. While this tail-based feature might seem computationally unfriendly, it provides an advantageous setting for a semismooth-Newton-based augmented Lagrangian method. The superquantile operator effectively reduces the dimensions of the Newton systems since the tail expectation involves considerably fewer scenarios. Notably, the extra cost of obtaining relevant second-order information and performing matrix inversions is often comparable to, and sometimes even less than, the effort required for gradient computation. Our developed solver is particularly effective when the number of scenarios substantially exceeds the number of decision variables. In synthetic problems with linear and convex diagonal quadratic objectives, numerical experiments demonstrate that our method outperforms existing approaches by a large margin: It achieves speeds more than 750 times faster for linear and quadratic objectives than the alternating direction method of multipliers as implemented by OSQP for computing low-accuracy solutions. Additionally, it is up to 25 times faster for linear objectives and 70 times faster for quadratic objectives than the commercial solver Gurobi, and 20 times faster for linear objectives and 30 times faster for quadratic objectives than the Portfolio Safeguard optimization suite for high-accuracy solution computations.
- [739] arXiv:2405.07971 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Sensitivity Analysis for Active Sampling, with Applications to the Simulation of Analog CircuitsComments: 7 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)
We propose an active sampling flow, with the use-case of simulating the impact of combined variations on analog circuits. In such a context, given the large number of parameters, it is difficult to fit a surrogate model and to efficiently explore the space of design features.
By combining a drastic dimension reduction using sensitivity analysis and Bayesian surrogate modeling, we obtain a flexible active sampling flow. On synthetic and real datasets, this flow outperforms the usual Monte-Carlo sampling which often forms the foundation of design space exploration. - [740] arXiv:2405.07976 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Localized Adaptive Risk ControlSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Adaptive Risk Control (ARC) is an online calibration strategy based on set prediction that offers worst-case deterministic long-term risk control, as well as statistical marginal coverage guarantees. ARC adjusts the size of the prediction set by varying a single scalar threshold based on feedback from past decisions. In this work, we introduce Localized Adaptive Risk Control (L-ARC), an online calibration scheme that targets statistical localized risk guarantees ranging from conditional risk to marginal risk, while preserving the worst-case performance of ARC. L-ARC updates a threshold function within a reproducing kernel Hilbert space (RKHS), with the kernel determining the level of localization of the statistical risk guarantee. The theoretical results highlight a trade-off between localization of the statistical risk and convergence speed to the long-term risk target. Thanks to localization, L-ARC is demonstrated via experiments to produce prediction sets with risk guarantees across different data subpopulations, significantly improving the fairness of the calibrated model for tasks such as image segmentation and beam selection in wireless networks.
- [741] arXiv:2405.07977 (cross-list from q-bio.QM) [pdf, ps, other]
-
Title: A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of ConfoundsAnton Orlichenko, Gang Qu, Ziyu Zhou, Anqi Liu, Hong-Wen Deng, Zhengming Ding, Julia M. Stephen, Tony W. Wilson, Vince D. Calhoun, Yu-Ping WangComments: 12 pagesSubjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging. Methods: We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). Results: We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity. Conclusion: Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics. Significance: Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds.
Cross submissions for Tuesday, 14 May 2024 (showing 107 of 107 entries )
- [742] arXiv:1405.1300 (replaced) [pdf, ps, other]
-
Title: The calculation of air filtration efficiency through the visual basic programming languageComments: 8 pages, 2 tables, 5 figures, 13 equationsSubjects: Computational Engineering, Finance, and Science (cs.CE)
The present article describes the development of a software which was written in visual basic programming language. The software calculates the particle collection efficiency and penetration of a fibrous filter medium for given values of particle diameter, fiber diameter filter thickness, volume fraction etc, during the process of air filtration. The progress of the development of software is divided into two steps, the design of graphical user interface (GUI) and the code writing. The code is mainly based on the mathematical model of particle collection efficiency of fibrous filters media.
- [743] arXiv:1902.03667 (replaced) [pdf, ps, html, other]
-
Title: Differential Similarity in Higher Dimensional Spaces: Theory and ApplicationsComments: 67 pages, 30 figures. Revised to match the published version of arXiv:1401.2411 [cs.LG]Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper presents an extension and an elaboration of the theory of differential similarity, which was originally proposed in arXiv:1401.2411 [cs.LG]. The goal is to develop an algorithm for clustering and coding that combines a geometric model with a probabilistic model in a principled way. For simplicity, the geometric model in the earlier paper was restricted to the three-dimensional case. The present paper removes this restriction, and considers the full $n$-dimensional case. Although the mathematical model is the same, the strategies for computing solutions in the $n$-dimensional case are different, and one of the main purposes of this paper is to develop and analyze these strategies. Another main purpose is to devise techniques for estimating the parameters of the model from sample data, again in $n$ dimensions. We evaluate the solution strategies and the estimation techniques by applying them to two familiar real-world examples: the classical MNIST dataset and the CIFAR-10 dataset.
- [744] arXiv:1907.13538 (replaced) [pdf, ps, html, other]
-
Title: LassoNet: Deep Lasso-Selection of 3D Point CloudsComments: 10 pagesJournal-ref: TVCG2019Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
Selection is a fundamental task in exploratory analysis and visualization of 3D point clouds. Prior researches on selection methods were developed mainly based on heuristics such as local point density, thus limiting their applicability in general data. Specific challenges root in the great variabilities implied by point clouds (e.g., dense vs. sparse), viewpoint (e.g., occluded vs. non-occluded), and lasso (e.g., small vs. large). In this work, we introduce LassoNet, a new deep neural network for lasso selection of 3D point clouds, attempting to learn a latent mapping from viewpoint and lasso to point cloud regions. To achieve this, we couple user-target points with viewpoint and lasso information through 3D coordinate transform and naive selection, and improve the method scalability via an intention filtering and farthest point sampling. A hierarchical network is trained using a dataset with over 30K lasso-selection records on two different point cloud data. We conduct a formal user study to compare LassoNet with two state-of-the-art lasso-selection methods. The evaluations confirm that our approach improves the selection effectiveness and efficiency across different combinations of 3D point clouds, viewpoints, and lasso selections. Project Website: this https URL
- [745] arXiv:2009.13521 (replaced) [pdf, ps, other]
-
Title: Zero-Knowledge GamesComments: 11 pages, 1 figure, 5 tablesSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)
This paper introduces Markov processes for zero-knowledge games and models Nash equilibrium found in such games. Symmetric games are shown to have an analog zero-knowledge game when confidence in a player being informed is considered with respect to trust in propositions and proofs.
- [746] arXiv:2011.04324 (replaced) [pdf, ps, html, other]
-
Title: Streaming Algorithms for Geometric Steiner ForestSubjects: Data Structures and Algorithms (cs.DS)
We consider an important generalization of the Steiner tree problem, the \emph{Steiner forest problem}, in the Euclidean plane: the input is a multiset $X \subseteq \mathbb{R}^2$, partitioned into $k$ color classes $C_1, C_2, \ldots, C_k \subseteq X$. The goal is to find a minimum-cost Euclidean graph $G$ such that every color class $C_i$ is connected in $G$. We study this Steiner forest problem in the streaming setting, where the stream consists of insertions and deletions of points to $X$. Each input point $x\in X$ arrives with its color $\textsf{color}(x) \in [k]$, and as usual for dynamic geometric streams, the input points are restricted to the discrete grid $\{0, \ldots, \Delta\}^2$.
We design a single-pass streaming algorithm that uses $\mathrm{poly}(k \cdot \log\Delta)$ space and time, and estimates the cost of an optimal Steiner forest solution within ratio arbitrarily close to the famous Euclidean Steiner ratio $\alpha_2$ (currently $1.1547 \le \alpha_2 \le 1.214$). This approximation guarantee matches the state-of-the-art bound for streaming Steiner tree, i.e., when $k=1$, and it is a major open question to improve the ratio to $1 + \epsilon$ even for this special case. Our approach relies on a novel combination of streaming techniques, like sampling and linear sketching, with the classical Arora-style dynamic-programming framework for geometric optimization problems, which usually requires large memory and has so far not been applied in the streaming setting.
We complement our streaming algorithm for the Steiner forest problem with simple arguments showing that any finite approximation requires $\Omega(k)$ bits of space. - [747] arXiv:2012.01954 (replaced) [pdf, ps, other]
-
Title: Robust Path-following for Keplerian OrbitsComments: Under reviewSubjects: Systems and Control (eess.SY)
This work introduces a novel path-following control strategy inspired by the famous two-body problem, aiming to stabilize any Keplerian orbit. Utilizing insights from the mathematical structure of the two-body problem, we derive a robust path-following law adopting sliding mode control theory to achieve asymptotic convergence to bounded disturbances. The resulting control law is demonstrated to be asymptotically stable. Illustrative examples showcase its applicability, including orbiting an accelerated moving point, patching Keplerian trajectories for complex patterns, and orbital maintenance around the asteroid Itokawa. The proposed control law offers a significant advantage for the orbital station-keeping problem, as its sliding surface is formulated based on variables commonly used to define orbital dynamics. This inherent alignment facilitates easy application to orbital station-keeping scenarios.
- [748] arXiv:2103.03864 (replaced) [pdf, ps, other]
-
Title: Learning to Extend Molecular Scaffolds with Structural MotifsKrzysztof Maziarz, Henry Jackson-Flux, Pashmina Cameron, Finton Sirockin, Nadine Schneider, Nikolaus Stiefl, Marwin Segler, Marc BrockschmidtComments: Published at the 10th International Conference on Learning Representations (ICLR 2022)Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been explored. Here, we propose MoLeR, a graph-based model that naturally supports scaffolds as initial seed of the generative procedure, which is possible because it is not conditioned on the generation history. Our experiments show that MoLeR performs comparably to state-of-the-art methods on unconstrained molecular optimization tasks, and outperforms them on scaffold-based tasks, while being an order of magnitude faster to train and sample from than existing approaches. Furthermore, we show the influence of a number of seemingly minor design choices on the overall performance.
- [749] arXiv:2104.10751 (replaced) [pdf, ps, other]
-
Title: Rule Generation for Classification: Scalability, Interpretability, and FairnessSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce a new rule-based optimization method for classification with constraints. The proposed method leverages column generation for linear programming, and hence, is scalable to large datasets. The resulting pricing subproblem is shown to be NP-Hard. We recourse to a decision tree-based heuristic and solve a proxy pricing subproblem for acceleration. The method returns a set of rules along with their optimal weights indicating the importance of each rule for learning. We address interpretability and fairness by assigning cost coefficients to the rules and introducing additional constraints. In particular, we focus on local interpretability and generalize separation criterion in fairness to multiple sensitive attributes and classes. We test the performance of the proposed methodology on a collection of datasets and present a case study to elaborate on its different aspects. The proposed rule-based learning method exhibits a good compromise between local interpretability and fairness on the one side, and accuracy on the other side.
- [750] arXiv:2108.03812 (replaced) [pdf, ps, html, other]
-
Title: Low-Complexity Algorithm for Restless Bandits with Imperfect ObservationsSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
We consider a class of restless bandit problems that finds a broad application area in reinforcement learning and stochastic optimization. We consider $N$ independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue. The aim is to maximize the expected discounted sum of returns over the infinite horizon subject to a constraint that only $M$ $(<N)$ processes may be observed at each step. Observation is error-prone: there are known probabilities that state 1 (0) will be observed as 0 (1). From this one knows, at any time $t$, a probability that process $i$ is in state 1. The resulting system may be modeled as a restless multi-armed bandit problem with an information state space of uncountable cardinality. Restless bandit problems with even finite state spaces are PSPACE-HARD in general. We propose a novel approach for simplifying the dynamic programming equations of this class of restless bandits and develop a low-complexity algorithm that achieves a strong performance and is readily extensible to the general restless bandit model with observation errors. Under certain conditions, we establish the existence (indexability) of Whittle index and its equivalence to our algorithm. When those conditions do not hold, we show by numerical experiments the near-optimal performance of our algorithm in the general parametric space. Furthermore, we theoretically prove the optimality of our algorithm for homogeneous systems.
- [751] arXiv:2108.13836 (replaced) [pdf, ps, other]
-
Title: Explainable AI for engineering design: A unified approach of systems engineering and component-based deep learningComments: 18 pagesSubjects: Machine Learning (cs.LG); Software Engineering (cs.SE); Systems and Control (eess.SY)
Data-driven models created by machine learning, gain in importance in all fields of design and engineering. They, have high potential to assist decision-makers in creating novel, artefacts with better performance and sustainability. However,, limited generalization and the black-box nature of these models, lead to limited explainability and reusability. To overcome this, situation, we propose a component-based approach to create, partial component models by machine learning (ML). This, component-based approach aligns deep learning with systems, engineering (SE). The key contribution of the component-based, method is that activations at interfaces between the components, are interpretable engineering quantities. In this way, the, hierarchical component system forms a deep neural network, (DNN) that a priori integrates information for engineering, explainability. The, approach adapts the model structure to engineering methods of, systems engineering and to domain knowledge. We examine the, performance of the approach by the field of energy-efficient, building design: First, we observed better generalization of the, component-based method by analyzing prediction accuracy, outside the training data. Especially for representative designs, different in structure, we observe a much higher accuracy, (R2 = 0.94) compared to conventional monolithic methods, (R2 = 0.71). Second, we illustrate explainability by exemplary, demonstrating how sensitivity information from SE and rules, from low-depth decision trees serve engineering. Third, we, evaluate explainability by qualitative and quantitative methods, demonstrating the matching of preliminary knowledge and data-driven, derived strategies and show correctness of activations at, component interfaces compared to white-box simulation results, (envelope components: R2 = 0.92..0.99; zones: R2 = 0.78..0.93).
- [752] arXiv:2110.02830 (replaced) [pdf, ps, other]
-
Title: Parameterized Algorithms for the Steiner Arborescence Problem on a HypercubeSubjects: Data Structures and Algorithms (cs.DS)
Motivated by a phylogeny reconstruction problem in evolutionary biology, we study the minimum Steiner arborescence problem on directed hypercubes (\textsc{MSA-DH}).
Given $m$, representing the directed hypercube $\vec{Q}_m$, and a set of terminals $R$, the problem asks to find a Steiner arborescence that spans $R$ with minimum cost. As $m$ implicitly represents $\vec{Q}_m$ comprising $2^{m}$ vertices, the running time analyses of traditional Steiner tree algorithms on general graphs does not give a clear understanding of the actual complexity of this problem.
We present algorithms that exploit the structure of the hypercube and run in time polynomial in $|R|$ and $m$.
We explore the \textsc{MSA-DH} problem on three natural parameters -- $R$, and two above-guarantee parameters, number of Steiner nodes $p$ and penalty $q$. For above-guarantee parameters, the parameterized \textsc{MSA-DH} problem takes $p \geq 0$ or $q\geq 0$ as input, and outputs a Steiner arborescence with at most $|R| + p - 1$ or $m + q$ edges respectively.
We present the following results ($\tilde{\mathcal{O}}$ hides the polynomial factors):
\begin{enumerate}
\item An exact algorithm that runs in $\tilde{\mathcal{O}}(3^{|R|})$ time.
\item A randomized algorithm that runs in $\tilde{\mathcal{O}}(9^q)$ time with success probability $\geq 4^{-q}$.
\item An exact algorithm that runs in $\tilde{\mathcal{O}}(36^q)$ time.
\item A $(1+q)$-approximation algorithm that runs in $\tilde{\mathcal{O}}(1.25284^q)$ time.
\item An $\mathcal{O}\left(p\ell_{\mathrm{max}} \right)$-additive approximation algorithm that runs in $\tilde{\mathcal{O}}(\ell_{\mathrm{max}}^{p+2})$ time, where $\ell_{\mathrm{max}}$ is the maximum distance of any terminal from the root.
\end{enumerate} - [753] arXiv:2110.08196 (replaced) [pdf, ps, other]
-
Title: The Pebble-Relation Comonad in Finite Model TheoryComments: Extended version of the paper in Logic in Computer Science (LICS) 2022 ProceedingsSubjects: Logic in Computer Science (cs.LO); Logic (math.LO)
The pebbling comonad, introduced by Abramsky, Dawar and Wang, provides a categorical interpretation for the k-pebble games from finite model theory. The coKleisli category of the pebbling comonad specifies equivalences under different fragments and extensions of infinitary k-variable logic. Moreover, the coalgebras over this pebbling comonad characterise treewidth and correspond to tree decompositions. In this paper we introduce the pebble-relation comonad, which characterises pathwidth and whose coalgebras correspond to path decompositions. We further show that the existence of a coKleisli morphism in this comonad is equivalent to truth preservation in the restricted conjunction fragment of k-variable infinitary logic. We do this using Dalmau's pebble-relation game and an equivalent all-in-one pebble game. We then provide a similar treatment to the corresponding coKleisli isomorphisms via a bijective version of the all-in-one pebble game. Finally, we show as a consequence a new Lovász-type theorem relating pathwidth to the restricted conjunction fragment of k-variable infinitary logic with counting quantifiers.
- [754] arXiv:2201.08647 (replaced) [pdf, ps, other]
-
Title: An Approximation Algorithm for $K$-best Enumeration of Minimal Connected Edge Dominating Sets with Cardinality ConstraintsSubjects: Data Structures and Algorithms (cs.DS)
\emph{$K$-best enumeration}, which asks to output $k$-best solutions without duplication, is a helpful tool in data analysis for many fields. In such fields, graphs typically represent data. Thus subgraph enumeration has been paid much attention to such fields. However, $k$-best enumeration tends to be intractable since, in many cases, finding one optimum solution is \NP-hard. To overcome this difficulty, we combine $k$-best enumeration with a concept of enumeration algorithms called \emph{approximation enumeration algorithms}. As a main result, we propose a $4$-approximation algorithm for minimal connected edge dominating sets which outputs $k$ minimal solutions with cardinality at most $4\cdot\OPT$, where $\OPT$ is the cardinality of a minimum solution which is \emph{not} outputted by the algorithm. Our proposed algorithm runs in $\order{nm^2\Delta}$ delay, where $n$, $m$, $\Delta$ are the number of vertices, the number of edges, and the maximum degree of an input graph.
- [755] arXiv:2202.03356 (replaced) [pdf, ps, other]
-
Title: Efficient Direct-Connect Topologies for Collective CommunicationsLiangyu Zhao, Siddharth Pal, Tapan Chugh, Weiyang Wang, Jason Fantl, Prithwish Basu, Joud Khoury, Arvind KrishnamurthySubjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
We consider the problem of distilling efficient network topologies for collective communications. We provide an algorithmic framework for constructing direct-connect topologies optimized for the latency vs. bandwidth trade-off associated with the workload. Our approach synthesizes many different topologies and schedules for a given cluster size and degree and then identifies the appropriate topology and schedule for a given workload. Our algorithms start from small, optimal base topologies and associated communication schedules and use techniques that can be iteratively applied to derive much larger topologies and schedules. Additionally, we incorporate well-studied large-scale graph topologies into our algorithmic framework by producing efficient collective schedules for them using a novel polynomial-time algorithm. Our evaluation uses multiple testbeds and large-scale simulations to demonstrate significant performance benefits from our derived topologies and schedules.
- [756] arXiv:2203.04214 (replaced) [pdf, ps, html, other]
-
Title: Building Your Own Trusted Execution Environments Using FPGASubjects: Cryptography and Security (cs.CR)
In recent years, we have witnessed unprecedented growth in using hardware-assisted Trusted Execution Environments (TEE) or enclaves to protect sensitive code and data on commodity devices thanks to new hardware security features, such as Intel SGX and Arm TrustZone. Even though the proprietary TEEs bring many benefits, they have been criticized for lack of transparency, vulnerabilities, and various restrictions. For example, existing TEEs only provide a static and fixed hardware Trusted Computing Base (TCB), which cannot be customized for different applications. Existing TEEs time-share a processor core with the Rich Execution Environment (REE), making execution less efficient and vulnerable to cache side-channel attacks. Moreover, TrustZone lacks hardware support for multiple TEEs, remote attestation, and memory encryption.
In this paper, we present BYOTee (Build Your Own Trusted Execution Environments), which is an easy-to-use infrastructure for building multiple equally secure enclaves by utilizing commodity Field Programmable Gate Arrays (FPGA) devices. BYOTee creates enclaves with customized hardware TCBs, which include softcore CPUs, block RAMs, and peripheral connections, in FPGA on demand. Additionally, BYOTee provides mechanisms to attest the integrity of the customized enclaves' hardware and software stacks, including bitstream, firmware, and the Security-Sensitive Applications (SSA) along with their inputs and outputs to remote verifiers. We implement a BYOTee system for the Xilinx System-on-Chip (SoC) FPGA. The evaluations on the low-end Zynq-7000 system for four SSAs and 12 benchmark applications demonstrate the usage, security, effectiveness, and performance of the BYOTee framework. - [757] arXiv:2203.05294 (replaced) [pdf, ps, other]
-
Title: Domain Generalisation for Object Detection under Covariate and Concept ShiftSubjects: Computer Vision and Pattern Recognition (cs.CV)
Domain generalisation aims to promote the learning of domain-invariant features while suppressing domain-specific features, so that a model can generalise better to previously unseen target domains. An approach to domain generalisation for object detection is proposed, the first such approach applicable to any object detection architecture. Based on a rigorous mathematical analysis, we extend approaches based on feature alignment with a novel component for performing class conditional alignment at the instance level, in addition to aligning the marginal feature distributions across domains at the image level. This allows us to fully address both components of domain shift, i.e. covariate and concept shift, and learn a domain agnostic feature representation. We perform extensive evaluation with both one-stage (FCOS, YOLO) and two-stage (FRCNN) detectors, on a newly proposed benchmark comprising several different datasets for autonomous driving applications (Cityscapes, BDD10K, ACDC, IDD) as well as the GWHD dataset for precision agriculture, and show consistent improvements to the generalisation and localisation performance over baselines and state-of-the-art.
- [758] arXiv:2205.06665 (replaced) [pdf, ps, other]
-
Title: Fully Abstract Encodings of $\lambda$-Calculus in HOcore through Abstract MachinesSubjects: Logic in Computer Science (cs.LO)
We present fully abstract encodings of the call-by-name and call-by-value $\lambda$-calculus into HOcore, a minimal higher-order process calculus with no name restriction. We consider several equivalences on the $\lambda$-calculus side -- normal-form bisimilarity, applicative bisimilarity, and contextual equivalence -- that we internalize into abstract machines in order to prove full abstraction of the encodings. We also demonstrate that this technique scales to the $\lambda\mu$-calculus, i.e., a standard extension of the $\lambda$-calculus with control operators.
- [759] arXiv:2206.00979 (replaced) [pdf, ps, other]
-
Title: Multi-scale Wasserstein Shortest-path Graph Kernels for Graph ClassificationComments: 12 pagesJournal-ref: IEEE Transactions on Artificial Intelligence, 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Graph kernels are conventional methods for computing graph similarities. However, the existing R-convolution graph kernels cannot resolve both of the two challenges: 1) Comparing graphs at multiple different scales, and 2) Considering the distributions of substructures when computing the kernel matrix. These two challenges limit their performances. To mitigate both of the two challenges, we propose a novel graph kernel called the Multi-scale Wasserstein Shortest-Path graph kernel (MWSP), at the heart of which is the multi-scale shortest-path node feature map, of which each element denotes the number of occurrences of the shortest path around a node. The shortest path is represented by the concatenation of all the labels of nodes in it. Since the shortest-path node feature map can only compare graphs at local scales, we incorporate into it the multiple different scales of the graph structure, which are captured by the truncated BFS trees of different depths rooted at each node in a graph. We use the Wasserstein distance to compute the similarity between the multi-scale shortest-path node feature maps of two graphs, considering the distributions of shortest paths. We empirically validate MWSP on various benchmark graph datasets and demonstrate that it achieves state-of-the-art performance on most datasets.
- [760] arXiv:2206.07328 (replaced) [pdf, ps, other]
-
Title: A Survey : Neural Networks for AMR-to-TextComments: The field of AMR-to-Text is very old and is not interested by current NLP group. As we spoted there are some potential pitfall in the paper, we decided to withdraw the paperSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
AMR-to-text is one of the key techniques in the NLP community that aims at generating sentences from the Abstract Meaning Representation (AMR) graphs. Since AMR was proposed in 2013, the study on AMR-to-Text has become increasingly prevalent as an essential branch of structured data to text because of the unique advantages of AMR as a high-level semantic description of natural language. In this paper, we provide a brief survey of AMR-to-Text. Firstly, we introduce the current scenario of this technique and point out its difficulties. Secondly, based on the methods used in previous studies, we roughly divided them into five categories according to their respective mechanisms, i.e., Rules-based, Seq-to-Seq-based, Graph-to-Seq-based, Transformer-based, and Pre-trained Language Model (PLM)-based. In particular, we detail the neural network-based method and present the latest progress of AMR-to-Text, which refers to AMR reconstruction, Decoder optimization, etc. Furthermore, we present the benchmarks and evaluation methods of AMR-to-Text. Eventually, we provide a summary of current techniques and the outlook for future research.
- [761] arXiv:2206.13444 (replaced) [pdf, ps, html, other]
-
Title: Zenix: Efficient Execution of Bulky Serverless ApplicationsZhiyuan Guo, Zachary Blanco, Junda Chen, Jinmou Li, Zerui Wei, Bili Dong, Ishaan Pota, Mohammad Shahrad, Harry Xu, Yiying ZhangSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Serverless computing, commonly offered as Function-as-a-Service, was initially designed for small, lean applications. However, there has been an increasing desire to run larger, more complex applications (what we call bulky applications) in a serverless manner. Existing strategies for enabling such applications are to either increase function sizes or to rewrite applications as DAGs of functions. These approaches cause significant resource wastage, manual efforts, and/or performance overhead.
We argue that the root cause of these issues is today's function-centric serverless model, where a function is the resource allocation and scaling unit. We propose a new, resource-centric serverless-computing model for executing bulky applications in a resource- and performance-efficient way, and we build the Zenix serverless platform following this model. Our results show that Zenix reduces resource consumption by up to 90% compared to today's function-centric serverless systems, while improving performance by up to 64%. - [762] arXiv:2207.03091 (replaced) [pdf, ps, other]
-
Title: Online SuBmodular + SuPermodular (BP) Maximization with Bandit FeedbackComments: 37 pages, 4 figuresSubjects: Machine Learning (cs.LG)
In the context of online interactive machine learning with combinatorial objectives, we extend purely submodular prior work to more general non-submodular objectives. This includes: (1) those that are additively decomposable into a sum of two terms (a monotone submodular and monotone supermodular term, known as a BP decomposition); and (2) those that are only weakly submodular. In both cases, this allows representing not only competitive (submodular) but also complementary (supermodular) relationships between objects, enhancing this setting to a broader range of applications (e.g., movie recommendations, medical treatments, etc.) where this is beneficial. In the two-term case, moreover, we study not only the more typical monolithic feedback approach but also a novel framework where feedback is available separately for each term. With real-world practicality and scalability in mind, we integrate Nystrom sketching techniques to significantly reduce the computational cost, including for the purely submodular case. In the Gaussian process contextual bandits setting, we show sub-linear theoretical regret bounds in all cases. We also empirically show good applicability to recommendation systems and data subset selection.
- [763] arXiv:2207.07934 (replaced) [pdf, ps, html, other]
-
Title: Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language ModelSubjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training, and 2) ignore the textual context related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
- [764] arXiv:2207.08213 (replaced) [pdf, ps, other]
-
Title: Compensation of Phase Noise in Massive-MIMO Uplink Communications Based on Expectation-Maximization AlgorithmComments: Accepted for publication in Journal of Communications and NetworksSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Phase noise (PN) is a major disturbance in MIMO systems, where the contribution of different oscillators at the transmitter and the receiver side may degrade the overall performance and offset the gains offered by MIMO techniques. This is even more crucial in the case of massive MIMO, since the number of PN sources may increase considerably. In this work, we propose an iterative receiver based on the application of the expectation-maximization algorithm. We consider a massive MIMO framework with a general association of oscillators to antennas, and include other channel disturbances like imperfect channel state information and Rician block fading. At each receiver iteration, given the information on the transmitted symbols, steepest descent is used to estimate the PN samples, with an optimized adaptive step size and a threshold-based stopping rule. The results obtained for several test cases show how the bit error rate and mean square error can benefit from the proposed phase-detection algorithm, even to the point of reaching the same performance as in the case where no PN is present, offering better results than a state-of-the-art alternative. Further analysis of the results allow to draw some useful trade-offs respecting final performance and consumption of resources.
- [765] arXiv:2208.00763 (replaced) [pdf, ps, other]
-
Title: HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and ComputationComments: arXiv admin note: text overlap with arXiv:2112.13972Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Performance (cs.PF)
Quantization for CNN has shown significant progress with the intention of reducing the cost of computation and storage with low-bitwidth data representations. There are, however, no systematic studies on how an existing full-bitwidth processing unit, such as ALU in CPUs and DSP in FPGAs, can be better utilized to deliver significantly higher computation throughput for convolution under various quantized bitwidths. In this study, we propose HiKonv, a unified solution that maximizes the throughput of convolution on a given underlying processing unit with low-bitwidth quantized data inputs through novel bit-wise management and parallel computation. We establish theoretical framework and performance models using a full-bitwidth multiplier for highly parallelized low-bitwidth convolution, and demonstrate new breakthroughs for high-performance computing in this critical domain. For example, a single 32-bit processing unit in CPU can deliver 128 binarized convolution operations (multiplications and additions) and 13 4-bit convolution operations with a single multiplication instruction, and a single 27x18 multiplier in the FPGA DSP can deliver 60, 8 or 2 convolution operations with 1, 4 or 8-bit inputs in one clock cycle. We demonstrate the effectiveness of HiKonv on both CPU and FPGA. On CPU, HiKonv outperforms the baseline implementation with 1 to 8-bit inputs and provides up to 7.6x and 1.4x performance improvements for 1-D convolution, and performs 2.74x and 3.19x over the baseline implementation for 4-bit signed and unsigned data inputs for 2-D convolution. On FPGA, HiKonv solution enables a single DSP to process multiple convolutions with a shorter processing latency. For binarized input, each DSP with HiKonv is equivalent up to 76.6 LUTs. Compared to the DAC-SDC 2020 champion model, HiKonv achieves a 2.37x throughput improvement and 2.61x DSP efficiency improvement, respectively.
- [766] arXiv:2208.07199 (replaced) [pdf, ps, other]
-
Title: On the Complexity of Distance-$d$ Independent Set ReconfigurationComments: 17 pages, 8 figures, minor revisionsSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
For a fixed positive integer $d \geq 2$, a distance-$d$ independent set (D$d$IS) of a graph is a vertex subset whose distance between any two members is at least $d$. Imagine that there is a token placed on each member of a D$d$IS. Two D$d$ISs are adjacent under Token Sliding ($\mathsf{TS}$) if one can be obtained from the other by moving a token from one vertex to one of its unoccupied adjacent vertices. Under Token Jumping ($\mathsf{TJ}$), the target vertex needs not to be adjacent to the original one. The Distance-$d$ Independent Set Reconfiguration (D$d$ISR) problem under $\mathsf{TS}/\mathsf{TJ}$ asks if there is a corresponding sequence of adjacent D$d$ISs that transforms one given D$d$IS into another. The problem for $d = 2$, also known as the Independent Set Reconfiguration problem, has been well-studied in the literature and its computational complexity on several graph classes has been known. In this paper, we study the computational complexity of D$d$ISR on different graphs under $\mathsf{TS}$ and $\mathsf{TJ}$ for any fixed $d \geq 3$. On chordal graphs, we show that D$d$ISR under $\mathsf{TJ}$ is in $\mathtt{P}$ when $d$ is even and $\mathtt{PSPACE}$-complete when $d$ is odd. On split graphs, there is an interesting complexity dichotomy: D$d$ISR is $\mathtt{PSPACE}$-complete for $d = 2$ but in $\mathtt{P}$ for $d=3$ under $\mathsf{TS}$, while under $\mathsf{TJ}$ it is in $\mathtt{P}$ for $d = 2$ but $\mathtt{PSPACE}$-complete for $d = 3$. Additionally, certain well-known hardness results for $d = 2$ on perfect graphs and planar graphs of maximum degree three and bounded bandwidth can be extended for $d \geq 3$.
- [767] arXiv:2209.03434 (replaced) [pdf, ps, html, other]
-
Title: Sporthesia: Augmenting Sports Videos Using Natural LanguageComments: 10 pages, IEEE VIS conferenceJournal-ref: IEEE Transactions on Visualization and Computer Graphics 2022Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
Augmented sports videos, which combine visualizations and video effects to present data in actual scenes, can communicate insights engagingly and thus have been increasingly popular for sports enthusiasts around the world. Yet, creating augmented sports videos remains a challenging task, requiring considerable time and video editing skills. On the other hand, sports insights are often communicated using natural language, such as in commentaries, oral presentations, and articles, but usually lack visual cues. Thus, this work aims to facilitate the creation of augmented sports videos by enabling analysts to directly create visualizations embedded in videos using insights expressed in natural language. To achieve this goal, we propose a three-step approach - 1) detecting visualizable entities in the text, 2) mapping these entities into visualizations, and 3) scheduling these visualizations to play with the video - and analyzed 155 sports video clips and the accompanying commentaries for accomplishing these steps. Informed by our analysis, we have designed and implemented Sporthesia, a proof-of-concept system that takes racket-based sports videos and textual commentaries as the input and outputs augmented videos. We demonstrate Sporthesia's applicability in two exemplar scenarios, i.e., authoring augmented sports videos using text and augmenting historical sports videos based on auditory comments. A technical evaluation shows that Sporthesia achieves high accuracy (F1-score of 0.9) in detecting visualizable entities in the text. An expert evaluation with eight sports analysts suggests high utility, effectiveness, and satisfaction with our language-driven authoring method and provides insights for future improvement and opportunities.
- [768] arXiv:2209.07048 (replaced) [pdf, ps, other]
-
Title: Automatically Recommend Code Updates: Are We There Yet?Comments: Under review at a SE journalSubjects: Software Engineering (cs.SE)
In recent years, large pre-trained Language Models of Code (CodeLMs) have shown promising results on various software engineering tasks. One such task is automatic code update recommendation, which transforms outdated code snippets into their approved and revised counterparts. Although many CodeLM-based approaches have been proposed, claiming high accuracy, their effectiveness and reliability on real-world code update tasks remain questionable. In this paper, we present the first extensive evaluation of state-of-the-art CodeLMs for automatically recommending code updates. We assess their performance on two diverse datasets of paired updated methods, considering factors such as temporal evolution, project specificity, method size, and update complexity. Our results reveal that while CodeLMs perform well in settings that ignore temporal information, they struggle in more realistic time-wise scenarios and generalize poorly to new projects. Furthermore, CodeLM performance decreases significantly for larger methods and more complex updates. Furthermore, we observe that many CodeLM-generated "updates" are actually null, especially in time-wise settings, and meaningful edits remain challenging. Our findings highlight the significant gap between the perceived and actual effectiveness of CodeLMs for real-world code update recommendation and emphasize the need for more research on improving their practicality, robustness, and generalizability.
- [769] arXiv:2209.07618 (replaced) [pdf, ps, other]
-
Title: Differentiable Bilevel Programming for Stackelberg Congestion GamesSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
In a Stackelberg congestion game (SCG), a leader aims to maximize their own gain by anticipating and manipulating the equilibrium state at which the followers settle by playing a congestion game. Often formulated as bilevel programs, large-scale SCGs are well known for their intractability and complexity. Here, we attempt to tackle this computational challenge by marrying traditional methodologies with the latest differentiable programming techniques in machine learning. The core idea centers on replacing the lower-level equilibrium problem with a smooth evolution trajectory defined by the imitative logit dynamic (ILD), which we prove converges to the equilibrium of the congestion game under mild conditions. Building upon this theoretical foundation, we propose two new local search algorithms for SCGs. The first is a gradient descent algorithm that obtains the derivatives by unrolling ILD via differentiable programming. Thanks to the smoothness of ILD, the algorithm promises both efficiency and scalability. The second algorithm adds a heuristic twist by cutting short the followers' evolution trajectory. Behaviorally, this means that, instead of anticipating the followers' best response at equilibrium, the leader seeks to approximate that response by only looking ahead a limited number of steps. Our numerical experiments are carried out over various instances of classic SCG applications, ranging from toy benchmarks to large-scale real-world examples. The results show the proposed algorithms are reliable and scalable local solvers that deliver high-quality solutions with greater regularity and significantly less computational effort compared to the many incumbents included in our study.
- [770] arXiv:2210.02119 (replaced) [pdf, ps, other]
-
Title: ISFL: Federated Learning for Non-i.i.d. Data with Local Importance SamplingJournal-ref: IEEE Internet of Things Journal (2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
As a promising learning paradigm integrating computation and communication, federated learning (FL) proceeds the local training and the periodic sharing from distributed clients. Due to the non-i.i.d. data distribution on clients, FL model suffers from the gradient diversity, poor performance, bad convergence, etc. In this work, we aim to tackle this key issue by adopting importance sampling (IS) for local training. We propose importance sampling federated learning (ISFL), an explicit framework with theoretical guarantees. Firstly, we derive the convergence theorem of ISFL to involve the effects of local importance sampling. Then, we formulate the problem of selecting optimal IS weights and obtain the theoretical solutions. We also employ a water-filling method to calculate the IS weights and develop the ISFL algorithms. The experimental results on CIFAR-10 fit the proposed theorems well and verify that ISFL reaps better performance, sampling efficiency, as well as explainability on non-i.i.d. data. To the best of our knowledge, ISFL is the first non-i.i.d. FL solution from the local sampling aspect which exhibits theoretical compatibility with neural network models. Furthermore, as a local sampling approach, ISFL can be easily migrated into other emerging FL frameworks.
- [771] arXiv:2210.10012 (replaced) [pdf, ps, html, other]
-
Title: Log-linear Guardedness and its ImplicationsComments: Accepted as a long paper in ACL 2023Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. However, the impact of this removal on the behavior of downstream classifiers trained on the modified representations is not fully understood. In this work, we formally define the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation, and study its implications. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept. However, we demonstrate that a multiclass log-linear model \emph{can} be constructed that indirectly recovers the concept in some cases, pointing to the inherent limitations of log-linear guardedness as a downstream bias mitigation technique. These findings shed light on the theoretical limitations of linear erasure methods and highlight the need for further research on the connections between intrinsic and extrinsic bias in neural models.
- [772] arXiv:2210.11634 (replaced) [pdf, ps, other]
-
Title: A Polynomial-time Algorithm for the Large Scale of Airplane Refueling ProblemComments: 17 pages, 1 figureSubjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Airplane refueling problem is a nonlinear unconstrained optimization problem with $n!$ feasible solutions. Given a fleet of $n$ airplanes with mid-air refueling technique, the question is to find the best refueling policy to make the last remaining airplane travels the farthest. In order to deal with the large scale of airplanes refueling instances, we proposed the definition of sequential feasible solution by employing the refueling properties of data structure. We proved that if an airplanes refueling instance has feasible solutions, it must have the sequential feasible solutions; and the optimal feasible solution must be the optimal sequential feasible solution. Then we proposed the sequential search algorithm which consists of two steps. The first step of the sequential search algorithm aims to seek out all of the sequential feasible solutions. When the input size of $n$ is greater than an index number, we proved that the number of the sequential feasible solutions will change to grow at a polynomial rate. The second step of the sequential search algorithm aims to search for the maximal sequential feasible solution by bubble sorting all of the sequential feasible solutions. Moreover, we built an efficient computability scheme, according to which we could forecast within a polynomial time the computational complexity of the sequential search algorithm that runs on any given airplanes refueling instance. Thus we could provide a computational strategy for decision makers or algorithm users by considering with their available computing resources.
- [773] arXiv:2211.13174 (replaced) [pdf, ps, other]
-
Title: Evolutionary Generalized Zero-Shot LearningComments: IJCAI2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Attribute-based Zero-Shot Learning (ZSL) has revolutionized the ability of models to recognize new classes not seen during training. However, with the advancement of large-scale models, the expectations have risen. Beyond merely achieving zero-shot generalization, there is a growing demand for universal models that can continually evolve in expert domains using unlabeled data. To address this, we introduce a scaled-down instantiation of this challenge: Evolutionary Generalized Zero-Shot Learning (EGZSL). This setting allows a low-performing zero-shot model to adapt to the test data stream and evolve online. We elaborate on three challenges of this special task, \ie, catastrophic forgetting, initial prediction bias, and evolutionary data class bias. Moreover, we propose targeted solutions for each challenge, resulting in a generic method capable of continuous evolution from a given initial IGZSL model. Experiments on three popular GZSL benchmark datasets demonstrate that our model can learn from the test data stream while other baselines fail. Codes are available at \url{this https URL}.
- [774] arXiv:2211.15363 (replaced) [pdf, ps, other]
-
Title: On the Security Vulnerabilities of Text-to-SQL ModelsComments: Best Paper Candidate at ISSRE 2023. Replaced "PLM" with "LLM" for better visibilitySubjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Databases (cs.DB); Machine Learning (cs.LG); Software Engineering (cs.SE)
Although it has been demonstrated that Natural Language Processing (NLP) algorithms are vulnerable to deliberate attacks, the question of whether such weaknesses can lead to software security threats is under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL systems that are commonly used to create natural language interfaces to databases. We showed that the Text-to-SQL modules within six commercial applications can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service attacks. This is the first demonstration that NLP models can be exploited as attack vectors in the wild. In addition, experiments using four open-source language models verified that straightforward backdoor attacks on Text-to-SQL systems achieve a 100% success rate without affecting their performance. The aim of this work is to draw the community's attention to potential software security issues associated with NLP algorithms and encourage exploration of methods to mitigate against them.
- [775] arXiv:2301.09362 (replaced) [pdf, ps, other]
-
Title: A Comprehensive Survey on Heart Sound Analysis in the Deep Learning EraComments: Accepted by IEEE Computational Intelligence MagazineSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Heart sound auscultation has been applied in clinical usage for early screening of cardiovascular diseases. Due to the high demand for auscultation expertise, automatic auscultation can help with auxiliary diagnosis and reduce the burden of training professional clinicians. Nevertheless, there is a limit to classic machine learning's performance improvement in the era of big data. Deep learning has outperformed classic machine learning in many research fields, as it employs more complex model architectures with a stronger capability of extracting effective representations. Moreover, it has been successfully applied to heart sound analysis in the past years. As most review works about heart sound analysis were carried out before 2017, the present survey is the first to work on a comprehensive overview to summarise papers on heart sound analysis with deep learning published in 2017--2022. This work introduces both classic machine learning and deep learning for comparison, and further offer insights about the advances and future research directions in deep learning for heart sound analysis. Our repository is publicly available at \url{this https URL}.
- [776] arXiv:2301.13748 (replaced) [pdf, ps, other]
-
Title: Archetypal Analysis++: Rethinking the Initialization StrategyComments: 27 pages, 17 figures, accepted at the Transactions on Machine Learning ResearchSubjects: Machine Learning (cs.LG)
Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential, but frequently used initialization methods yield either sub-optimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective function, similar to $k$-means++. In fact, we argue that $k$-means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of $k$-means++ to AA++. In an extensive empirical evaluation of 15 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, we show that AA++ almost always outperforms all baselines, including the most frequently used ones.
- [777] arXiv:2302.01314 (replaced) [pdf, ps, other]
-
Title: Universal Coding for Shannon Ciphers under Side-Channel AttacksComments: 6 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:1801.02563, arXiv:2201.11670, arXiv:1901.05940Subjects: Information Theory (cs.IT)
We study the universal coding under side-channel attacks posed and investigated by Oohama and Santoso (2022). They proposed a theoretical security model for Shannon cipher system under side-channel attacks, where the adversary is not only allowed to collect ciphertexts by eavesdropping the public communication channel, but is also allowed to collect the physical information leaked by the devices where the cipher system is implemented on such as running time, power consumption, electromagnetic radiation, etc. For any distributions of the plain text, any noisy channels through which the adversary observe the corrupted version of the key, and any measurement device used for collecting the physical information, we can derive an achievable rate region for reliability and security such that if we compress the ciphertext with rate within the achievable rate region, then: (1) anyone with secret key will be able to decrypt and decode the ciphertext correctly, but (2) any adversary who obtains the ciphertext and also the side physical information will not be able to obtain any information about the hidden source as long as the leaked physical information is encoded with a rate within the rate region.
- [778] arXiv:2302.03549 (replaced) [pdf, ps, other]
-
Title: An Achievable and Analytic Solution to Information Bottleneck for Gaussian MixturesSubjects: Information Theory (cs.IT)
In this paper, we study a remote source coding scenario in which binary phase shift keying (BPSK) modulation sources are corrupted by additive white Gaussian noise (AWGN). An intermediate node, such as a relay, receives these observations and performs additional compression to balance complexity and relevance. This problem can be further formulated as an information bottleneck (IB) problem with Bernoulli sources and Gaussian mixture observations. However, no closed-form solution exists for this IB problem. To address this challenge, we propose a unified achievable scheme that employs three different compression/quantization strategies for intermediate node processing by using two-level quantization, multi-level deterministic quantization, and soft quantization with the hyperbolic tangent ($\tanh$) function, respectively. In addition, we extend our analysis to the vector mixture Gaussian observation problem and explore its application in machine learning for binary classification with information leakage. Numerical evaluations show that the proposed scheme has a near-optimal performance over various signal-to-noise ratios (SNRs), compared to the Blahut-Arimoto (BA) algorithm, and has better performance than some existing numerical methods such as the information dropout approach. Furthermore, experiments conducted on the realistic MNIST dataset also validate the superior classification accuracy of our method compared to the information dropout approach.
- [779] arXiv:2303.03476 (replaced) [pdf, ps, html, other]
-
Title: iBall: Augmenting Basketball Videos with Gaze-moderated Embedded VisualizationsComments: ACM CHI23Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
We present iBall, a basketball video-watching system that leverages gaze-moderated embedded visualizations to facilitate game understanding and engagement of casual fans. Video broadcasting and online video platforms make watching basketball games increasingly accessible. Yet, for new or casual fans, watching basketball videos is often confusing due to their limited basketball knowledge and the lack of accessible, on-demand information to resolve their confusion. To assist casual fans in watching basketball videos, we compared the game-watching behaviors of casual and die-hard fans in a formative study and developed iBall based on the fndings. iBall embeds visualizations into basketball videos using a computer vision pipeline, and automatically adapts the visualizations based on the game context and users' gaze, helping casual fans appreciate basketball games without being overwhelmed. We confrmed the usefulness, usability, and engagement of iBall in a study with 16 casual fans, and further collected feedback from 8 die-hard fans.
- [780] arXiv:2303.05614 (replaced) [pdf, ps, other]
-
Title: On the Existence of Anomalies, The Reals CaseSubjects: Computational Complexity (cs.CC)
The Independence Postulate (IP) is a finitary Church-Turing Thesis, saying mathematical sequences are independent from physical ones. Modelling observations as infinite sequences of real numbers, IP implies the existence of anomalies.
- [781] arXiv:2303.06847 (replaced) [pdf, ps, other]
-
Title: Label Distribution Learning from Logical LabelSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Label distribution learning (LDL) is an effective method to predict the label description degree (a.k.a. label distribution) of a sample. However, annotating label distribution (LD) for training samples is extremely costly. So recent studies often first use label enhancement (LE) to generate the estimated label distribution from the logical label and then apply external LDL algorithms on the recovered label distribution to predict the label distribution for unseen samples. But this step-wise manner overlooks the possible connections between LE and LDL. Moreover, the existing LE approaches may assign some description degrees to invalid labels. To solve the above problems, we propose a novel method to learn an LDL model directly from the logical label, which unifies LE and LDL into a joint model, and avoids the drawbacks of the previous LE methods. Extensive experiments on various datasets prove that the proposed approach can construct a reliable LDL model directly from the logical label, and produce more accurate label distribution than the state-of-the-art LE methods.
- [782] arXiv:2303.08367 (replaced) [pdf, ps, other]
-
Title: Uncertainty-Aware Pedestrian Trajectory Prediction via Distributional DiffusionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Tremendous efforts have been put forth on predicting pedestrian trajectory with generative models to accommodate uncertainty and multi-modality in human behaviors. An individual's inherent uncertainty, e.g., change of destination, can be masked by complex patterns resulting from the movements of interacting pedestrians. However, latent variable-based generative models often entangle such uncertainty with complexity, leading to limited either latent expressivity or predictive diversity. In this work, we propose to separately model these two factors by implicitly deriving a flexible latent representation to capture intricate pedestrian movements, while integrating predictive uncertainty of individuals with explicit bivariate Gaussian mixture densities over their future locations. More specifically, we present a model-agnostic uncertainty-aware pedestrian trajectory prediction framework, parameterizing sufficient statistics for the mixture of Gaussians that jointly comprise the multi-modal trajectories. We further estimate these parameters of interest by approximating a denoising process that progressively recovers pedestrian movements from noise. Unlike previous studies, we translate the predictive stochasticity to explicit distributions, allowing it to readily generate plausible future trajectories indicating individuals' self-uncertainty. Moreover, our framework is compatible with different neural net architectures. We empirically show the performance gains over state-of-the-art even with lighter backbones, across most scenes on two public benchmarks.
- [783] arXiv:2303.17314 (replaced) [pdf, ps, other]
-
Title: PFL: a Probabilistic Logic for Fault TreesComments: arXiv admin note: text overlap with arXiv:2208.13424Journal-ref: In: Chechik, M., Katoen, JP., Leucker, M. (eds) Formal Methods. FM 2023. Lecture Notes in Computer Science, vol 14000. Springer, ChamSubjects: Logic in Computer Science (cs.LO)
Safety-critical infrastructures must operate in a safe and reliable way. Fault tree analysis is a widespread method used for risk assessment of these systems: fault trees (FTs) are required by, e.g., the Federal Aviation Administration and the Nuclear Regulatory Commission. In spite of their popularity, little work has been done on formulating structural queries about FT and analyzing these, e.g., when evaluating potential scenarios, and to give practitioners instruments to formulate queries on FTs in an understandable yet powerful way. In this paper, we aim to fill this gap by extending BFL [32], a logic that reasons about Boolean FTs. To do so, we introduce a Probabilistic Fault tree Logic (PFL). PFL is a simple, yet expressive logic that supports easier formulation of complex scenarios and specification of FT properties that comprise probabilities. Alongside PFL, we present LangPFL, a domain specific language to further ease property specification. We showcase PFL and LangPFL by applying them to a COVID-19 related FT and to a FT for an oil/gas pipeline. Finally, we present theory and model checking algorithms based on binary decision diagrams (BDDs).
- [784] arXiv:2304.03985 (replaced) [pdf, ps, html, other]
-
Title: On Rotation Distance of Rank Bounded TreesComments: 28 pages, 2 figures, Abstract shortened to meet arxiv requirements, accepted journal versionSubjects: Data Structures and Algorithms (cs.DS)
Computing the rotation distance between two binary trees with $n$ internal nodes efficiently (in $poly(n)$ time) is a long standing open question in the study of height balancing in tree data structures. In this paper, we initiate the study of this problem bounding the rank of the trees given at the input (defined by Ehrenfeucht and Haussler (1989) in the context of decision trees). We define the rank-bounded rotation distance between two given binary trees $T_1$ and $T_2$ (with $n$ internal nodes) of rank at most $r$, denoted by $d_r(T_1,T_2)$, as the length of the shortest sequence of rotations that transforms $T_1$ to $T_2$ with the restriction that the intermediate trees must be of rank at most $r$. We show that the rotation distance problem reduces in polynomial time to the rank bounded rotation distance problem. This motivates the study of the problem in the combinatorial and algorithmic frontiers. Observing that trees with rank $1$ coincide exactly with skew trees (binary trees where every internal node has at least one leaf as a child), we show the following results in this frontier :
We present an $O(n^2)$ time algorithm for computing $d_1(T_1,T_2)$. That is, when the given trees are skew trees (we call this variant as skew rotation distance problem) - where the intermediate trees are restricted to be skew as well. In particular, our techniques imply that for any two skew trees $d(T_1,T_2) \le n^2$.
We show the following upper bound : for any two trees $T_1$ and $T_2$ of rank at most $r_1$ and $r_2$ respectively, we have that: $d_r(T_1,T_2) \le n^2 (1+(2n+1)(r_1+r_2-2))$ where $r = max\{r_1,r_2\}$. This bound is asymptotically tight for $r=1$.
En route our proof of the above theorems, we associate binary trees to permutations and bivariate polynomials, and prove several characterizations in the case of skew trees. - [785] arXiv:2304.05215 (replaced) [pdf, ps, other]
-
Title: A Billion-scale Foundation Model for Remote Sensing ImagesComments: This manuscript is the accepted version for IEEE IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (J-STARS)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
As the potential of foundation models in visual tasks has garnered significant attention, pretraining these models before downstream tasks has become a crucial step. The three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters. Recently, research in the remote sensing field has focused primarily on the pretraining method and the size of the dataset, with limited emphasis on the number of model parameters. This paper addresses this gap by examining the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks such as rotated object detection and semantic segmentation. We pretrained foundation models with varying numbers of parameters, including 86M, 605.26M, 1.3B, and 2.4B, to determine whether performance in downstream tasks improved with an increase in parameters. To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field. Furthermore, we propose an effective method for scaling up and fine-tuning a vision transformer in the remote sensing field. To evaluate general performance in downstream tasks, we employed the DOTA v2.0 and DIOR-R benchmark datasets for rotated object detection, and the Potsdam and LoveDA datasets for semantic segmentation. Experimental results demonstrated that, across all benchmark datasets and downstream tasks, the performance of the foundation models and data efficiency improved as the number of parameters increased. Moreover, our models achieve the state-of-the-art performance on several datasets including DIOR-R, Postdam, and LoveDA.
- [786] arXiv:2304.05219 (replaced) [pdf, ps, other]
-
Title: BanditQ: Fair Bandits with Guaranteed RewardsComments: To appear in the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024), Barcelona, SpainSubjects: Machine Learning (cs.LG); Performance (cs.PF)
Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound (UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their objective of playing the most rewarding arm as frequently as possible while ignoring the rest. In this paper, we consider a fair prediction problem in the stochastic setting with a guaranteed minimum rate of accrual of rewards for each arm. We study the problem in both full-information and bandit feedback settings. Combining queueing-theoretic techniques with adversarial bandits, we propose a new online policy, called BanditQ, that achieves the target reward rates while conceding a regret and target rate violation penalty of at most $O(T^{\frac{3}{4}}).$ The regret bound in the full-information setting can be further improved to $O(\sqrt{T})$ under either a monotonicity assumption or when considering time-averaged regret. The proposed policy is efficient and admits a black-box reduction from the fair prediction problem to the standard adversarial MAB problem. The analysis of the BanditQ policy involves a new self-bounding inequality, which might be of independent interest.
- [787] arXiv:2304.07590 (replaced) [pdf, ps, other]
-
Title: Self-collaboration Code Generation via ChatGPTComments: Accepted to TOSEMSubjects: Software Engineering (cs.SE)
Although Large Language Models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex tasks through collaborative teamwork, a strategy that significantly controls development complexity and enhances software quality. Inspired by this, we present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. Specifically, through role instructions, 1) Multiple LLM agents act as distinct `experts', each responsible for a specific subtask within a complex task; 2) Specify the way to collaborate and interact, so that different roles form a virtual team to facilitate each other's work, ultimately the virtual team addresses code generation tasks collaboratively without the need for human intervention. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework. Thus, we assemble an elementary team consisting of three LLM roles (i.e., analyst, coder, and tester) responsible for software development's analysis, coding, and testing stages. We conduct comprehensive experiments on various code-generation benchmarks. Experimental results indicate that self-collaboration code generation relatively improves 29.9%-47.1% Pass@1 compared to the base LLM agent. Moreover, we showcase that self-collaboration could potentially enable LLMs to efficiently handle complex repository-level tasks that are not readily solved by the single LLM agent.
- [788] arXiv:2304.11563 (replaced) [pdf, ps, html, other]
-
Title: Stochastic Window Mean-Payoff GamesComments: 38 pages, minor changes: added remarks about randomized strategies, improved phrasings and notations. Full version of paper accepted in FoSSaCS 2024Subjects: Computer Science and Game Theory (cs.GT)
Stochastic two-player games model systems with an environment that is both adversarial and stochastic. The environment is modeled by a player (Player 2) who tries to prevent the system (Player 1) from achieving its objective. We consider finitary versions of the traditional mean-payoff objective, replacing the long-run average of the payoffs by payoff average computed over a finite sliding window. Two variants have been considered: in one variant, the maximum window length is fixed and given, while in the other, it is not fixed but is required to be bounded. For both variants, we present complexity bounds and algorithmic solutions for computing strategies for Player 1 to ensure that the objective is satisfied with positive probability, with probability 1, or with probability at least $p$, regardless of the strategy of Player 2. The solution crucially relies on a reduction to the special case of non-stochastic two-player games. We give a general characterization of prefix-independent objectives for which this reduction holds. The memory requirement for both players in stochastic games is also the same as in non-stochastic games by our reduction. Moreover, for non-stochastic games, we improve upon the upper bound for the memory requirement of Player 1 and upon the lower bound for the memory requirement of Player 2.
- [789] arXiv:2304.13138 (replaced) [pdf, ps, other]
-
Title: The Update-Equivalence Framework for Decision-Time PlanningSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on solving subgames, but rather on update equivalence. In this update-equivalence framework, decision-time planning algorithms replicate the updates of last-iterate algorithms, which need not rely on public information. This facilitates scalability to games with large amounts of non-public information. Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent. We validate the performance of these algorithms in cooperative and adversarial domains, notably in Hanabi, the standard benchmark for search in fully cooperative imperfect-information games. Here, our mirror descent approach exceeds or matches the performance of public information-based search while using two orders of magnitude less search time. This is the first instance of a non-public-information-based algorithm outperforming public-information-based approaches in a domain they have historically dominated.
- [790] arXiv:2305.00264 (replaced) [pdf, ps, other]
-
Title: A Comprehensive Review of Image Line Segment Detection and Description: Taxonomies, Comparisons, and ChallengesComments: This work has been accepted by the IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) for publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Computer Vision and Pattern Recognition (cs.CV)
An image line segment is a fundamental low-level visual feature that delineates straight, slender, and uninterrupted portions of objects and scenarios within images. Detection and description of line segments lay the basis for numerous vision tasks. Although many studies have aimed to detect and describe line segments, a comprehensive review is lacking, obstructing their progress. This study fills the gap by comprehensively reviewing related studies on detecting and describing two-dimensional image line segments to provide researchers with an overall picture and deep understanding. Based on their mechanisms, two taxonomies for line segment detection and description are presented to introduce, analyze, and summarize these studies, facilitating researchers to learn about them quickly and extensively. The key issues, core ideas, advantages and disadvantages of existing methods, and their potential applications for each category are analyzed and summarized, including previously unknown findings. The challenges in existing methods and corresponding insights for potentially solving them are also provided to inspire researchers. In addition, some state-of-the-art line segment detection and description algorithms are evaluated without bias, and the evaluation code will be publicly available. The theoretical analysis, coupled with the experimental results, can guide researchers in selecting the best method for their intended vision applications. Finally, this study provides insights for potentially interesting future research directions to attract more attention from researchers to this field.
- [791] arXiv:2305.04234 (replaced) [pdf, ps, other]
-
Title: On guarded extensions of MMSNPSubjects: Computational Complexity (cs.CC); Logic in Computer Science (cs.LO)
We investigate logics and classes of problems below Fagin's existential second-order logic (ESO) and above Feder and Vardi's logic for constraint satisfaction problems (CSP), the so called monotone monadic SNP without inequality (MMSNP). It is known that MMSNP has a dichotomy between P and NP-complete but the removal of any of these three restrictions imposed on SNP yields a logic that is Ptime equivalent to ESO: so by Ladner's theorem we have three stronger sibling logics that are nondichotomic above MMSNP. In this paper, we explore the area between these four logics, mostly by considering guarded extensions of MMSNP, with the ultimate goal being to obtain logics above MMSNP that exhibit such a dichotomy.
- [792] arXiv:2305.06869 (replaced) [pdf, ps, other]
-
Title: An Adaptive Graduated Nonconvexity Loss Function for Robust Nonlinear Least Squares SolutionsSubjects: Robotics (cs.RO)
Many problems in robotics, such as estimating the state from noisy sensor data or aligning two point clouds, can be posed and solved as least-squares problems. Unfortunately, vanilla nonminimal solvers for least-squares problems are notoriously sensitive to outliers. As such, various robust loss functions have been proposed to reduce the sensitivity to outliers. Examples of loss functions include pseudo-Huber, Cauchy, and Geman-McClure. Recently, these loss functions have been generalized into a single loss function that enables the best loss function to be found adaptively based on the distribution of the residuals. However, even with the generalized robust loss function, most nonminimal solvers can only be solved locally given a prior state estimate due to the nonconvexity of the problem. The first contribution of this paper is to combine graduated nonconvexity (GNC) with the generalized robust loss function to solve least-squares problems without a prior state estimate and without the need to specify a loss function. Moreover, existing loss functions, including the generalized loss function, are based on Gaussian-like distribution. However, residuals are often defined as the squared norm of a multivariate error and distributed in a Chi-like fashion. The second contribution of this paper is to apply a norm-aware adaptive robust loss function within a GNC framework. The proposed approach enables a GNC formulation of a generalized loss function such that GNC can be readily applied to a wider family of loss functions. Furthermore, simulations and experiments demonstrate that the proposed method is more robust compared to non-GNC counterparts, and yields faster convergence times compared to other GNC formulations.
- [793] arXiv:2305.11716 (replaced) [pdf, ps, html, other]
-
Title: Efficient and Deterministic Search Strategy Based on Residual Projections for Point Cloud Registration with CorrespondencesJournal-ref: IEEE Transactions on Intelligent Vehicles, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Estimating the rigid transformation between two LiDAR scans through putative 3D correspondences is a typical point cloud registration paradigm. Current 3D feature matching approaches commonly lead to numerous outlier correspondences, making outlier-robust registration techniques indispensable. Many recent studies have adopted the branch and bound (BnB) optimization framework to solve the correspondence-based point cloud registration problem globally and deterministically. Nonetheless, BnB-based methods are time-consuming to search the entire 6-dimensional parameter space, since their computational complexity is exponential to the solution domain dimension in the worst-case. To enhance algorithm efficiency, existing works attempt to decouple the 6 degrees of freedom (DOF) original problem into two 3-DOF sub-problems, thereby reducing the search space. In contrast, our approach introduces a novel pose decoupling strategy based on residual projections, decomposing the raw registration problem into three sub-problems. Subsequently, we embed interval stabbing into BnB to solve these sub-problems within a lower two-dimensional domain, resulting in efficient and deterministic registration. Moreover, our method can be adapted to address the challenging problem of simultaneous pose and registration. Through comprehensive experiments conducted on challenging synthetic and real-world datasets, we demonstrate that the proposed method outperforms state-of-the-art methods in terms of efficiency while maintaining comparable robustness.
- [794] arXiv:2305.17223 (replaced) [pdf, ps, other]
-
Title: Do We Really Need a Large Number of Visual Prompts?Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Due to increasing interest in adapting models on resource-constrained edges, parameter-efficient transfer learning has been widely explored. Among various methods, Visual Prompt Tuning (VPT), prepending learnable prompts to input space, shows competitive fine-tuning performance compared to training of full network parameters. However, VPT increases the number of input tokens, resulting in additional computational overhead. In this paper, we analyze the impact of the number of prompts on fine-tuning performance and self-attention operation in a vision transformer architecture. Through theoretical and empirical analysis we show that adding more prompts does not lead to linear performance improvement. Further, we propose a Prompt Condensation (PC) technique that aims to prevent performance degradation from using a small number of prompts. We validate our methods on FGVC and VTAB-1k tasks and show that our approach reduces the number of prompts by ~70% while maintaining accuracy.
- [795] arXiv:2306.02865 (replaced) [pdf, ps, other]
-
Title: Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-CriticComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Learning high-quality $Q$-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) algorithms. Previous works primarily focus on addressing the value overestimation issue, an outcome of adopting function approximators and off-policy learning. Deviating from the common viewpoint, we observe that $Q$-values are often underestimated in the latter stage of the RL training process, potentially hindering policy learning and reducing sample efficiency. We find that such a long-neglected phenomenon is often related to the use of inferior actions from the current policy in Bellman updates as compared to the more optimal action samples in the replay buffer. To address this issue, our insight is to incorporate sufficient exploitation of past successes while maintaining exploration optimism. We propose the Blended Exploitation and Exploration (BEE) operator, a simple yet effective approach that updates $Q$-value using both historical best-performing actions and the current policy. Based on BEE, the resulting practical algorithm BAC outperforms state-of-the-art methods in over 50 continuous control tasks and achieves strong performance in failure-prone scenarios and real-world robot tasks. Benchmark results and videos are available at this https URL.
- [796] arXiv:2306.02914 (replaced) [pdf, ps, other]
-
Title: Beyond Generating Code: Evaluating GPT on a Data Visualization CourseChen Zhu-Tian, Chenyang Zhang, Qianwen Wang, Jakob Troidl, Simon Warchol, Johanna Beyer, Nils Gehlenborg, Hanspeter PfisterComments: vis short papgeSubjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
This paper presents an empirical evaluation of the performance of the Generative Pre-trained Transformer (GPT) model in Harvard's CS171 data visualization course. While previous studies have focused on GPT's ability to generate code for visualizations, this study goes beyond code generation to evaluate GPT's abilities in various visualization tasks, such as data interpretation, visualization design, visual data exploration, and insight communication. The evaluation utilized GPT-3.5 and GPT-4 to complete assignments of CS171, and included a quantitative assessment based on the established course rubrics, a qualitative analysis informed by the feedback of three experienced graders, and an exploratory study of GPT's capabilities in completing border visualization tasks. Findings show that GPT-4 scored 80% on quizzes and homework, and TFs could distinguish between GPT- and human-generated homework with 70% accuracy. The study also demonstrates GPT's potential in completing various visualization tasks, such as data cleanup, interaction with visualizations, and insight communication. The paper concludes by discussing the strengths and limitations of GPT in data visualization, potential avenues for incorporating GPT in broader visualization tasks, and the need to redesign visualization education.
- [797] arXiv:2306.03745 (replaced) [pdf, ps, other]
-
Title: Soft Merging of Experts with Adaptive RoutingSubjects: Machine Learning (cs.LG)
Sparsely activated neural networks with conditional computation learn to route their inputs through different "expert" subnetworks, providing a form of modularity that densely activated models lack. Despite their possible benefits, models with learned routing often underperform their parameter-matched densely activated counterparts as well as models that use non-learned heuristic routing strategies. In this paper, we hypothesize that these shortcomings stem from the gradient estimation techniques used to train sparsely activated models that use non-differentiable discrete routing decisions. To address this issue, we introduce Soft Merging of Experts with Adaptive Routing (SMEAR), which avoids discrete routing by using a single "merged" expert constructed via a weighted average of all of the experts' parameters. By routing activations through a single merged expert, SMEAR does not incur a significant increase in computational costs and enables standard gradient-based training. We empirically validate that models using SMEAR outperform models that route based on metadata or learn sparse routing through gradient estimation. Furthermore, we provide qualitative analysis demonstrating that the experts learned via SMEAR exhibit a significant amount of specialization. All of the code used in our experiments is publicly available.
- [798] arXiv:2306.05515 (replaced) [pdf, ps, other]
-
Title: PeFLL: Personalized Federated Learning by Learning to LearnSubjects: Machine Learning (cs.LG)
We present PeFLL, a new personalized federated learning algorithm that improves over the state-of-the-art in three aspects: 1) it produces more accurate models, especially in the low-data regime, and not only for clients present during its training phase, but also for any that may emerge in the future; 2) it reduces the amount of on-client computation and client-server communication by providing future clients with ready-to-use personalized models that require no additional finetuning or optimization; 3) it comes with theoretical guarantees that establish generalization from the observed clients to future ones. At the core of PeFLL lies a learning-to-learn approach that jointly trains an embedding network and a hypernetwork. The embedding network is used to represent clients in a latent descriptor space in a way that reflects their similarity to each other. The hypernetwork takes as input such descriptors and outputs the parameters of fully personalized client models. In combination, both networks constitute a learning algorithm that achieves state-of-the-art performance in several personalized federated learning benchmarks.
- [799] arXiv:2306.08426 (replaced) [pdf, ps, html, other]
-
Title: Patterns of Patterns IIJoseph Corneli, Noorah Alhasan, Leo Vivier, Alex Murphy, Raymond S. Puzio, Abby Tabor, Sridevi Ayloo, Charlotte Pierce, Charles J. Danoff, Mary Tedeschi, Manvinder Singh, Kajol KhetanComments: 27 pages, to appear in proceedings of PLoP 2023Subjects: Social and Information Networks (cs.SI)
Our earlier paper "Patterns of Patterns" combined three techniques from training, futures studies, and design in a design pattern called PLACARD that helps groups of people work together effectively. We used that pattern in five hands-on workshop case studies which took place at various locations in the US and the UK. This experience report documents what we learned, including the way our thinking about PLACARD evolved, together with additional patterns our work generated. We evaluate the reproducibility of our methods and results, and consider the broader economic implications of this way of working. We discuss implications of our prototyping work for the design of future platforms, drawing connections with recent developments in cognitive science and artificial intelligence. This positions our patterns of patterns as a toolkit for the design and governance of systems that combine social dynamics with technical components.
- [800] arXiv:2306.09479 (replaced) [pdf, ps, html, other]
-
Title: Inverse Scaling: When Bigger Isn't BetterIan R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim, Samuel R. Bowman, Ethan PerezComments: Published in TMLR (2023), 39 pagesJournal-ref: Transactions on Machine Learning Research (TMLR), 10/2023, https://openreview.net/forum?id=DwgRm72GQFSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at this https URL to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models.
- [801] arXiv:2306.10379 (replaced) [pdf, ps, html, other]
-
Title: Gradient-type subspace iteration methods for the symmetric eigenvalue problemComments: 32 pagesSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
This paper explores variants of the subspace iteration algorithm for computing approximate invariant subspaces. The standard subspace iteration approach is revisited and new variants that exploit gradient-type techniques combined with a Grassmann manifold viewpoint are developed. A gradient method as well as a nonlinear conjugate gradient technique are described. Convergence of the gradient-based algorithm is analyzed and a few numerical experiments are reported, indicating that the proposed algorithms are sometimes superior to standard algorithms. This includes the Chebyshev-based subspace iteration and the locally optimal block conjugate gradient method, when compared in terms of number of matrix vector products and computational time, resp. The new methods, on the other hand, do not require estimating optimal parameters. An important contribution of this paper to achieve this good performance is the accurate and efficient implementation of an exact line search. In addition, new convergence proofs are presented for the non-accelerated gradient method that includes a locally exponential convergence if started in a $\mathcal{O(\sqrt{\delta})}$ neighbourhood of the dominant subspace with spectral gap $\delta$.
- [802] arXiv:2306.11941 (replaced) [pdf, ps, other]
-
Title: Efficient Dynamics Modeling in Interactive Environments with Koopman TheoryComments: Accepted to ICLR 2024 and EWRL 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could advance Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results.
- [803] arXiv:2306.12608 (replaced) [pdf, ps, html, other]
-
Title: DP-BREM: Differentially-Private and Byzantine-Robust Federated Learning with Client MomentumSubjects: Cryptography and Security (cs.CR)
Federated Learning (FL) allows multiple participating clients to train machine learning models collaboratively while keeping their datasets local and only exchanging the gradient or model updates with a coordinating server. Existing FL protocols are vulnerable to attacks that aim to compromise data privacy and/or model robustness. Recently proposed defenses focused on ensuring either privacy or robustness, but not both. In this paper, we focus on simultaneously achieving differential privacy (DP) and Byzantine robustness for cross-silo FL, based on the idea of learning from history. The robustness is achieved via client momentum, which averages the updates of each client over time, thus reducing the variance of the honest clients and exposing the small malicious perturbations of Byzantine clients that are undetectable in a single round but accumulate over time. In our initial solution DP-BREM, DP is achieved by adding noise to the aggregated momentum, and we account for the privacy cost from the momentum, which is different from the conventional DP-SGD that accounts for the privacy cost from the gradient. Since DP-BREM assumes a trusted server (who can obtain clients' local models or updates), we further develop the final solution called DP-BREM+, which achieves the same DP and robustness properties as DP-BREM without a trusted server by utilizing secure aggregation techniques, where DP noise is securely and jointly generated by the clients. Both theoretical analysis and experimental results demonstrate that our proposed protocols achieve better privacy-utility tradeoff and stronger Byzantine robustness than several baseline methods, under different DP budgets and attack settings.
- [804] arXiv:2306.13491 (replaced) [pdf, ps, html, other]
-
Title: Augmenting Sports Videos with VisCommentatorJournal-ref: IEEE Transactions on Visualization and Computer Graphics ( Volume: 28, Issue: 1, January 2022)Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
Visualizing data in sports videos is gaining traction in sports analytics, given its ability to communicate insights and explicate player strategies engagingly. However, augmenting sports videos with such data visualizations is challenging, especially for sports analysts, as it requires considerable expertise in video editing. To ease the creation process, we present a design space that characterizes augmented sports videos at an element-level (what the constituents are) and clip-level (how those constituents are organized). We do so by systematically reviewing 233 examples of augmented sports videos collected from TV channels, teams, and leagues. The design space guides selection of data insights and visualizations for various purposes. Informed by the design space and close collaboration with domain experts, we design VisCommentator, a fast prototyping tool, to eases the creation of augmented table tennis videos by leveraging machine learning-based data extractors and design space-based visualization recommendations. With VisCommentator, sports analysts can create an augmented video by selecting the data to visualize instead of manually drawing the graphical marks. Our system can be generalized to other racket sports (e.g., tennis, badminton) once the underlying datasets and models are available. A user study with seven domain experts shows high satisfaction with our system, confirms that the participants can reproduce augmented sports videos in a short period, and provides insightful implications into future improvements and opportunities.
- [805] arXiv:2306.16950 (replaced) [pdf, ps, other]
-
Title: Step fusion: Local and global mutual guidanceComments: 8 pages,7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Feature alignment is the primary means of fusing multimodal data. We propose a feature alignment method that fully fuses multimodal information, which stepwise shifts and expands feature information from different modalities to have a consistent representation in a feature space. The proposed method can robustly capture high-level interactions between features of different modalities, thus significantly improving the performance of multimodal learning. We also show that the proposed method outperforms other popular multimodal schemes on multiple tasks. Experimental evaluation of ETT and MIT-BIH-Arrhythmia, datasets shows that the proposed method achieves state of the art performance.
- [806] arXiv:2306.17815 (replaced) [pdf, ps, other]
-
Title: Bayesian Optimization with Formal Safety Guarantees via Online Conformal PredictionComments: 15 pages, 10 figures, under review in an IEEE journalSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP)
Black-box zero-th order optimization is a central primitive for applications in fields as diverse as finance, physics, and engineering. In a common formulation of this problem, a designer sequentially attempts candidate solutions, receiving noisy feedback on the value of each attempt from the system. In this paper, we study scenarios in which feedback is also provided on the safety of the attempted solution, and the optimizer is constrained to limit the number of unsafe solutions that are tried throughout the optimization process. Focusing on methods based on Bayesian optimization (BO), prior art has introduced an optimization scheme -- referred to as SAFEOPT -- that is guaranteed not to select any unsafe solution with a controllable probability over feedback noise as long as strict assumptions on the safety constraint function are met. In this paper, a novel BO-based approach is introduced that satisfies safety requirements irrespective of properties of the constraint function. This strong theoretical guarantee is obtained at the cost of allowing for an arbitrary, controllable but non-zero, rate of violation of the safety constraint. The proposed method, referred to as SAFE-BOCP, builds on online conformal prediction (CP) and is specialized to the cases in which feedback on the safety constraint is either noiseless or noisy. Experimental results on synthetic and real-world data validate the advantages and flexibility of the proposed SAFE-BOCP.
- [807] arXiv:2307.02249 (replaced) [pdf, ps, other]
-
Title: Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You NeedComments: Accepted by IEEE TCSVTSubjects: Computer Vision and Pattern Recognition (cs.CV)
Weakly supervised whole slide image classification is usually formulated as a multiple instance learning (MIL) problem, where each slide is treated as a bag, and the patches cut out of it are treated as instances. Existing methods either train an instance classifier through pseudo-labeling or aggregate instance features into a bag feature through attention mechanisms and then train a bag classifier, where the attention scores can be used for instance-level classification. However, the pseudo instance labels constructed by the former usually contain a lot of noise, and the attention scores constructed by the latter are not accurate enough, both of which affect their performance. In this paper, we propose an instance-level MIL framework based on contrastive learning and prototype learning to effectively accomplish both instance classification and bag classification tasks. To this end, we propose an instance-level weakly supervised contrastive learning algorithm for the first time under the MIL setting to effectively learn instance feature representation. We also propose an accurate pseudo label generation method through prototype learning. We then develop a joint training strategy for weakly supervised contrastive learning, prototype learning, and instance classifier training. Extensive experiments and visualizations on four datasets demonstrate the powerful performance of our method. Codes are available at this https URL.
- [808] arXiv:2307.03451 (replaced) [pdf, ps, other]
-
Title: Encrypted Dynamic Control exploiting Limited Number of Multiplications and a Method using Ring-LWE based CryptosystemComments: 12 pages, 5 figures, submitted to IEEE Transactions on Systems, Man, and Cybernetics: SystemsSubjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR)
In this paper, we present a method to encrypt dynamic controllers that can be implemented through most homomorphic encryption schemes, including somewhat, leveled fully, and fully homomorphic encryption. To this end, we represent the output of the given controller as a linear combination of a fixed number of previous inputs and outputs. As a result, the encrypted controller involves only a limited number of homomorphic multiplications on every encrypted data, assuming that the output is re-encrypted and transmitted back from the actuator. A guidance for parameter choice is also provided, ensuring that the encrypted controller achieves predefined performance for an infinite time horizon. Furthermore, we propose a customization of the method for Ring-Learning With Errors (Ring-LWE) based cryptosystems, where a vector of messages can be encrypted into a single ciphertext and operated simultaneously, thus reducing computation and communication loads. Unlike previous results, the proposed customization does not require extra algorithms such as rotation, other than basic addition and multiplication. Simulation results demonstrate the effectiveness of the proposed method.
- [809] arXiv:2307.05639 (replaced) [pdf, ps, other]
-
Title: Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural NetworksComments: Accepted in Neural Networks, ElsevierJournal-ref: Neural Networks, Volume 176, 2024, 106335Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Providing a model that achieves a strong predictive performance and is simultaneously interpretable by humans is one of the most difficult challenges in machine learning research due to the conflicting nature of these two objectives. To address this challenge, we propose a modification of the radial basis function neural network model by equipping its Gaussian kernel with a learnable precision matrix. We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed. In particular, the eigenvectors explain the directions of maximum sensitivity of the model revealing the active subspace and suggesting potential applications for supervised dimensionality reduction. At the same time, the eigenvectors highlight the relationship in terms of absolute variation between the input and the latent variables, thereby allowing us to extract a ranking of the input variables based on their importance to the prediction task enhancing the model interpretability. We conducted numerical experiments for regression, classification, and feature selection tasks, comparing our model against popular machine learning models, the state-of-the-art deep learning-based embedding feature selection techniques, and a transformer model for tabular data. Our results demonstrate that the proposed model does not only yield an attractive prediction performance compared to the competitors but also provides meaningful and interpretable results that potentially could assist the decision-making process in real-world applications. A PyTorch implementation of the model is available on GitHub at the following link. this https URL
- [810] arXiv:2307.07652 (replaced) [pdf, ps, html, other]
-
Title: DIGEST: Fast and Communication Efficient Decentralized Learning with Local UpdatesSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Two widely considered decentralized learning algorithms are Gossip and random walk-based learning. Gossip algorithms (both synchronous and asynchronous versions) suffer from high communication cost, while random-walk based learning experiences increased convergence time. In this paper, we design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST by taking advantage of both Gossip and random-walk ideas, and focusing on stochastic gradient descent (SGD). DIGEST is an asynchronous decentralized algorithm building on local-SGD algorithms, which are originally designed for communication efficient centralized learning. We design both single-stream and multi-stream DIGEST, where the communication overhead may increase when the number of streams increases, and there is a convergence and communication overhead trade-off which can be leveraged. We analyze the convergence of single- and multi-stream DIGEST, and prove that both algorithms approach to the optimal solution asymptotically for both iid and non-iid data distributions. We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20. The simulation results confirm that multi-stream DIGEST has nice convergence properties; i.e., its convergence time is better than or comparable to the baselines in iid setting, and outperforms the baselines in non-iid setting.
- [811] arXiv:2307.08149 (replaced) [pdf, ps, html, other]
-
Title: Problems in NP can Admit Double-Exponential Lower Bounds when Parameterized by Treewidth or Vertex CoverFlorent Foucaud, Esther Galby, Liana Khazaliya, Shaohua Li, Fionn Mc Inerney, Roohani Sharma, Prafullkumar TaleComments: Shortened abstract to meet arxiv requirements. We split the original paper to keep paper length more manageable. This is the first part following an accompanying paper arXiv:2405.01344; and will be presented at ICALP 2024Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Treewidth (tw) is an important parameter that, when bounded, yields tractability for many problems. For example, graph problems expressible in Monadic Second Order (MSO) logic and QUANTIFIED SAT or, more generally, QUANTIFIED CSP, are FPT parameterized by the tw of the input's (primal) graph plus the length of the MSO-formula [Courcelle, Information & Computation 1990] and the quantifier rank [Chen, ECAI 2004], resp. The algorithms from these (meta-)results have running times whose dependence on tw is a tower of exponents. A conditional lower bound by Fichte et al. [LICS 2020] shows that, for QUANTIFIED SAT, the height of this tower is equal to the number of quantifier alternations. Lower bounds showing that at least double-exponential factors in the running time are necessary are rare: there are very few (for tw and vertex cover vc parameterizations) and they are for problems that are complete for #NP, $\Sigma_2^p$, $\Pi_2^p$, or higher levels of the polynomial hierarchy.
We show, for the first time, that it is not necessary to go higher up in the polynomial hierarchy to obtain such lower bounds. We design a novel, yet simple versatile technique based on Sperner families to obtain such lower bounds and apply it to 3 problems: METRIC DIMENSION, STRONG METRIC DIMENSION, and GEODETIC SET. We prove that they do not admit $2^{2^{o(tw)}} \cdot n^{O(1)}$-time algorithms, even on bounded diameter graphs, unless the ETH fails. For STRONG METRIC DIMENSION, the lower bound holds even for vc. We complement our lower bounds with matching upper bounds. - [812] arXiv:2307.08713 (replaced) [pdf, ps, other]
-
Title: Intuitionistic Fuzzy Broad Learning System: Enhancing Robustness Against Noise and OutliersJournal-ref: IEEE Transactions on Fuzzy Systems, 2024Subjects: Machine Learning (cs.LG)
In the realm of data classification, broad learning system (BLS) has proven to be a potent tool that utilizes a layer-by-layer feed-forward neural network. However, the traditional BLS treats all samples as equally significant, which makes it less robust and less effective for real-world datasets with noises and outliers. To address this issue, we propose fuzzy broad learning system (F-BLS) and the intuitionistic fuzzy broad learning system (IF-BLS) models that confront challenges posed by the noise and outliers present in the dataset and enhance overall robustness. Employing a fuzzy membership technique, the proposed F-BLS model embeds sample neighborhood information based on the proximity of each class center within the inherent feature space of the BLS framework. Furthermore, the proposed IF-BLS model introduces intuitionistic fuzzy concepts encompassing membership, non-membership, and score value functions. IF-BLS strategically considers homogeneity and heterogeneity in sample neighborhoods in the kernel space. We evaluate the performance of proposed F-BLS and IF-BLS models on UCI benchmark datasets with and without Gaussian noise. As an application, we implement the proposed F-BLS and IF-BLS models to diagnose Alzheimer's disease (AD). Experimental findings and statistical analyses consistently highlight the superior generalization capabilities of the proposed F-BLS and IF-BLS models over baseline models across all scenarios. The proposed models offer a promising solution to enhance the BLS framework's ability to handle noise and outliers. The source code link of the proposed model is available at this https URL.
- [813] arXiv:2307.08811 (replaced) [pdf, ps, other]
-
Title: Co(ve)rtex: ML Models as storage channels and their (mis-)applicationsMd Abdullah Al Mamun, Quazi Mishkatul Alam, Erfan Shayegani, Pedram Zaree, Ihsen Alouani, Nael Abu-GhazalehSubjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Machine learning (ML) models are overparameterized to support generality and avoid overfitting. The state of these parameters is essentially a "don't-care" with respect to the primary model provided that this state does not interfere with the primary model. In both hardware and software systems, don't-care states and undefined behavior have been shown to be sources of significant vulnerabilities. In this paper, we propose a new information theoretic perspective of the problem; we consider the ML model as a storage channel with a capacity that increases with overparameterization. Specifically, we consider a sender that embeds arbitrary information in the model at training time, which can be extracted by a receiver with a black-box access to the deployed model. We derive an upper bound on the capacity of the channel based on the number of available unused parameters. We then explore black-box write and read primitives that allow the attacker to:(i) store data in an optimized way within the model by augmenting the training data at the transmitter side, and (ii) to read it by querying the model after it is deployed. We also consider a new version of the problem which takes information storage covertness into account. Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models.
- [814] arXiv:2307.09214 (replaced) [pdf, ps, other]
-
Title: Patrolling Grids with a Bit of MemorySubjects: Robotics (cs.RO); Computational Geometry (cs.CG); Discrete Mathematics (cs.DM); Multiagent Systems (cs.MA); Combinatorics (math.CO)
This work addresses the challenge of patrolling regular grid graphs of any dimension using a single mobile agent with minimal memory and limited sensing range. We show that it is impossible to patrol some grid graphs with $0$ bits of memory, regardless of sensing range, and give an exact characterization of those grid graphs that can be patrolled with $0$ bits of memory and sensing range $V$. On the other hand, we show that an algorithm exists using $1$ bit of memory and $V=1$ that patrols any $d$-dimensional grid graph. This result is surprising given that the agent must be able to move in $2d$ distinct directions to patrol, while $1$ bit of memory allows specifying only two directions per sensory input. Our $1$-bit patrolling algorithm handles this by carefully exploiting a small state-space to access all the needed directions while avoiding getting stuck. Overall, our results give concrete evidence that extremely little memory is needed for patrolling highly regular environments like grid graphs compared to arbitrary graphs. The techniques we use, such as partitioning the environment into sensing regions and exploiting distinct coordinates resulting from higher-dimensionality, may be applicable to analyzing the space complexity of patrolling in other types of regular environments as well.
- [815] arXiv:2307.09913 (replaced) [pdf, ps, other]
-
Title: Exploring Non-Regular Extensions of Propositional Dynamic Logic with Description-Logics FeaturesSubjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
We investigate the impact of non-regular path expressions on the decidability of satisfiability checking and querying in description logics extending ALC. Our primary objects of interest are ALCreg and ALCvpl, the extensions of with path expressions employing, respectively, regular and visibly-pushdown languages. The first one, ALCreg, is a notational variant of the well-known Propositional Dynamic Logic of Fischer and Ladner. The second one, ALCvpl, was introduced and investigated by Loding and Serre in 2007. The logic ALCvpl generalises many known decidable non-regular extensions of ALCreg.
We provide a series of undecidability results. First, we show that decidability of the concept satisfiability problem for ALCvpl is lost upon adding the seemingly innocent Self operator. Second, we establish undecidability for the concept satisfiability problem for ALCvpl extended with nominals. Interestingly, our undecidability proof relies only on one single non-regular (visibly-pushdown) language, namely on r#s# := { r^n s^n | n in N } for fixed role names r and s. Finally, in contrast to the classical database setting, we establish undecidability of query entailment for queries involving non-regular atoms from r#s#, already in the case of ALC-TBoxes. - [816] arXiv:2307.09997 (replaced) [pdf, ps, other]
-
Title: TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase RecognitionComments: Code released at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
To enable context-aware computer assistance in the operating room of the future, cognitive systems need to understand automatically which surgical phase is being performed by the medical team. The primary source of information for surgical phase recognition is typically video, which presents two challenges: extracting meaningful features from the video stream and effectively modeling temporal information in the sequence of visual features. For temporal modeling, attention mechanisms have gained popularity due to their ability to capture long-range dependencies. In this paper, we explore design choices for attention in existing temporal models for surgical phase recognition and propose a novel approach that uses attention more effectively and does not require hand-crafted constraints: TUNeS, an efficient and simple temporal model that incorporates self-attention at the core of a convolutional U-Net structure. In addition, we propose to train the feature extractor, a standard CNN, together with an LSTM on preferably long video segments, i.e., with long temporal context. In our experiments, almost all temporal models performed better on top of feature extractors that were trained with longer temporal context. On these contextualized features, TUNeS achieves state-of-the-art results on the Cholec80 dataset. This study offers new insights on how to use attention mechanisms to build accurate and efficient temporal models for surgical phase recognition. Implementing automatic surgical phase recognition is essential to automate the analysis and optimization of surgical workflows and to enable context-aware computer assistance during surgery, thus ultimately improving patient care.
- [817] arXiv:2307.13243 (replaced) [pdf, ps, other]
-
Title: A Model Predictive Capture Point Control Framework for Robust Humanoid Balancing via Ankle, Hip, and Stepping StrategiesComments: 20 pages,14 figuresSubjects: Robotics (cs.RO)
The robust balancing capability of humanoid robots has been considered one of the crucial requirements for their mobility in real environments. In particular, many studies have been devoted to the efficient implementation of human-inspired ankle, hip, and stepping strategies, to endow humanoids with human-level balancing capability. In this paper, a robust balance control framework for humanoids is proposed. Firstly, a Model Predictive Control (MPC) framework is proposed for Capture Point (CP) tracking control, enabling the integration of ankle, hip, and stepping strategies within a single framework. Additionally, a variable weighting method is introduced that adjusts the weighting parameters of the Centroidal Angular Momentum (CAM) damping control. Secondly, a hierarchical structure of the MPC and a stepping controller was proposed, allowing for the step time optimization. The robust balancing performance of the proposed method is validated through simulations and real robot experiments. Furthermore, a superior balancing performance is demonstrated compared to a state-of-the-art Quadratic Programming (QP)-based CP controller that employs the ankle, hip, and stepping strategies. The supplementary video is available at this https URL
- [818] arXiv:2307.14532 (replaced) [pdf, ps, other]
-
Title: Absorbing Sets in Quantum LDPC CodesSubjects: Information Theory (cs.IT)
Iterative decoder failures of quantum low density parity check (QLDPC) codes are attributed to substructures in the code's graph, known as trapping sets, as well as degenerate errors that can arise in quantum codes. Failure inducing sets are subsets of codeword coordinates that, when initially in error, lead to decoding failure in a trapping set. The purpose of this paper is to examine failure inducing sets of QLDPC codes under syndrome-based iterative decoding. As for classical LDPC codes, we show that absorbing sets play a central role in understanding decoder failures. Raveendran and Vasic initiated the study of quantum trapping sets, where beyond the classical-type trapping sets, they identified rigid symmetric structures (a.k.a symmetric stabilizers) responsible for degenerate errors. In this paper, we show that this behavior is part of a much more general phenomenon that can be described by the absorbing set framework.
- [819] arXiv:2308.05734 (replaced) [pdf, ps, other]
-
Title: AudioLDM 2: Learning Holistic Audio Generation with Self-supervised PretrainingHaohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. PlumbleyComments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. Project page is this https URLSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called "language of audio" (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate state-of-the-art or competitive performance against previous approaches. Our code, pretrained model, and demo are available at this https URL.
- [820] arXiv:2308.08047 (replaced) [pdf, ps, other]
-
Title: Correlated vs. Uncorrelated Randomness in Adversarial Congestion Team GamesSubjects: Computer Science and Game Theory (cs.GT)
We consider team zero-sum network congestion games with $n$ agents playing against $k$ interceptors over a graph $G$. The agents aim to minimize their collective cost of sending traffic over paths in $G$, which is an aggregation of edge costs, while the interceptors aim to maximize the collective cost by increasing some of these edge costs. To evade the interceptors, the agents will usually use randomized strategies. We consider two cases, the correlated case when agents have access to a shared source of randomness, and the uncorrelated case, when each agent has access to only its own source of randomness. We study the additional cost that uncorrelated agents have to bear, specifically by comparing the costs incurred by agents in cost-minimal Nash equilibria when agents can and cannot share randomness.
We consider two natural cost functions on the agents, which measure the invested energy and time, respectively. We prove that for both of these cost functions, the ratio of uncorrelated cost to correlated cost at equilibrium is $O(\min(m_c(G),n))$, where $m_c(G)$ is the mincut size of $G$. This bound is much smaller than the most general case, where a tight, exponential bound of $\Theta((m_c(G))^{n-1})$ on the ratio is known. We also introduce a set of simple agent strategies which are approximately optimal agent strategies. We then establish conditions for when these strategies are optimal agent strategies for each cost function, showing an inherent difference between the two cost functions we study. - [821] arXiv:2308.08487 (replaced) [pdf, ps, other]
-
Title: Temporal Interest Network for User Response PredictionJournal-ref: The Web Conference, 2024Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
User response prediction is essential in industrial recommendation systems, such as online display advertising. Among all the features in recommendation models, user behaviors are among the most critical. Many works have revealed that a user's behavior reflects her interest in the candidate item, owing to the semantic or temporal correlation between behaviors and the candidate. While the literature has individually examined each of these correlations, researchers have yet to analyze them in combination, that is, the semantic-temporal correlation. We empirically measure this correlation and observe intuitive yet robust patterns. We then examine several popular user interest models and find that, surprisingly, none of them learn such correlation well.
To fill this gap, we propose a Temporal Interest Network (TIN) to capture the semantic-temporal correlation simultaneously between behaviors and the target. We achieve this by incorporating target-aware temporal encoding, in addition to semantic encoding, to represent behaviors and the target. Furthermore, we conduct explicit 4-way interaction by deploying target-aware attention and target-aware representation to capture both semantic and temporal correlation. We conduct comprehensive evaluations on two popular public datasets, and our proposed TIN outperforms the best-performing baselines by 0.43% and 0.29% on GAUC, respectively. During online A/B testing in Tencent's advertising platform, TIN achieves 1.65% cost lift and 1.93% GMV lift over the base model. It has been successfully deployed in production since October 2023, serving the WeChat Moments traffic. We have released our code at this https URL. - [822] arXiv:2308.09445 (replaced) [pdf, ps, html, other]
-
Title: parti-gem5: gem5's Timing Mode ParallelisedComments: Pre-print of work presented at SAMOS Conference XXIIISubjects: Hardware Architecture (cs.AR)
Detailed timing models are indispensable tools for the design space exploration of Multiprocessor Systems on Chip (MPSoCs). As core counts continue to increase, the complexity in memory hierarchies and interconnect topologies is also growing, making accurate predictions of design decisions more challenging than ever. In this context, the open-source Full System Simulator (FSS) gem5 is a popular choice for MPSoC design space exploration, thanks to its flexibility and robust set of detailed timing models. However, its single-threaded simulation kernel severely hampers its throughput. To address this challenge, we introduce parti-gem5, an extension of gem5 that enables parallel timing simulations on modern multi-core simulation hosts. Unlike previous works, parti-gem5 supports gem5's timing mode, the O3CPU, and Ruby's custom cache and interconnect models. Compared to reference single-thread simulations, we achieved speedups of up to 42.7x when simulating a 120-core ARM MPSoC on a 64-core x86-64 host system. While our method introduces timing deviations, the error in total simulated time is below 15% in most cases.
- [823] arXiv:2308.13184 (replaced) [pdf, ps, other]
-
Title: Performance Analysis of Finite Blocklength Transmissions Over Wiretap Fading Channels: An Average Information Leakage PerspectiveComments: To appear in IEEE Transactions on Wireless Communications. Note: This is an extended version of our work to be presented at the 2024 IEEE ICC (arXiv:2401.11219)Subjects: Information Theory (cs.IT)
Physical-layer security (PLS) is a promising technique to complement more traditional means of communication security in beyond-5G wireless networks. However, studies of PLS are often based on ideal assumptions such as infinite coding blocklengths or perfect knowledge of the wiretap link's channel state information (CSI). In this work, we study the performance of finite blocklength (FBL) transmissions using a new secrecy metric $\unicode{x2013}$ the average information leakage (AIL). We evaluate the exact and approximate AIL with Gaussian signaling and arbitrary fading channels, assuming that the eavesdropper's instantaneous CSI is unknown. We then conduct case studies that use artificial noise (AN) beamforming to analyze the AIL in both Rayleigh and Rician fading channels. The accuracy of the analytical expressions is verified through extensive simulations, and various insights regarding the impact of key system parameters on the AIL are obtained. Particularly, our results reveal that allowing a small level of AIL can potentially lead to significant reliability enhancements. To improve the system performance, we formulate and solve an average secrecy throughput (AST) optimization problem via both non-adaptive and adaptive design strategies. Our findings highlight the significance of blocklength design and AN power allocation, as well as the impact of their trade-off on the AST.
- [824] arXiv:2308.13540 (replaced) [pdf, ps, html, other]
-
Title: RL-LABEL: A Deep Reinforcement Learning Approach Intended for AR Label Placement in Dynamic ScenariosSubjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
Labels are widely used in augmented reality (AR) to display digital information. Ensuring the readability of AR labels requires placing them occlusion-free while keeping visual linkings legible, especially when multiple labels exist in the scene. Although existing optimization-based methods, such as force-based methods, are effective in managing AR labels in static scenarios, they often struggle in dynamic scenarios with constantly moving objects. This is due to their focus on generating layouts optimal for the current moment, neglecting future moments and leading to sub-optimal or unstable layouts over time. In this work, we present RL-LABEL, a deep reinforcement learning-based method for managing the placement of AR labels in scenarios involving moving objects. RL-LABEL considers the current and predicted future states of objects and labels, such as positions and velocities, as well as the user's viewpoint, to make informed decisions about label placement. It balances the trade-offs between immediate and long-term objectives. Our experiments on two real-world datasets show that RL-LABEL effectively learns the decision-making process for long-term optimization, outperforming two baselines (i.e., no view management and a force-based method) by minimizing label occlusions, line intersections, and label movement distance. Additionally, a user study involving 18 participants indicates that RL-LABEL excels over the baselines in aiding users to identify, compare, and summarize data on AR labels within dynamic scenes.
- [825] arXiv:2308.14222 (replaced) [pdf, ps, other]
-
Title: Accurate complex Jacobi rotationsComments: Supplementary material is available in this https URL and this https URL repositories. This is a slightly extended and enhanced version of the manuscript accepted for publication in Journal of Computational and Applied MathematicsSubjects: Numerical Analysis (math.NA)
This note shows how to compute, to high relative accuracy under mild assumptions, complex Jacobi rotations for diagonalization of Hermitian matrices of order two, using the correctly rounded functions $\mathtt{cr\_hypot}$ and $\mathtt{cr\_rsqrt}$, proposed for standardization in the C programming language as recommended by the IEEE-754 floating-point standard. The rounding to nearest (ties to even) and the non-stop arithmetic are assumed. The numerical examples compare the observed with theoretical bounds on the relative errors in the rotations' elements, and show that the maximal observed departure of the rotations' determinants from unity is smaller than that of the transformations computed by LAPACK.
- [826] arXiv:2308.14391 (replaced) [pdf, ps, other]
-
Title: FIRE: Food Image to REcipe generationComments: Published at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) -- 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Food computing has emerged as a prominent multidisciplinary field of research in recent years. An ambitious goal of food computing is to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. Current image-to-recipe methods are retrieval-based and their success depends heavily on the dataset size and diversity, as well as the quality of learned embeddings. Meanwhile, the emergence of powerful attention-based vision and language models presents a promising avenue for accurate and generalizable recipe generation, which has yet to be extensively explored. This paper proposes FIRE, a novel multimodal methodology tailored to recipe generation in the food computing domain, which generates the food title, ingredients, and cooking instructions based on input food images. FIRE leverages the BLIP model to generate titles, utilizes a Vision Transformer with a decoder for ingredient extraction, and employs the T5 model to generate recipes incorporating titles and ingredients as inputs. We showcase two practical applications that can benefit from integrating FIRE with large language model prompting: recipe customization to fit recipes to user preferences and recipe-to-code transformation to enable automated cooking processes. Our experimental findings validate the efficacy of our proposed approach, underscoring its potential for future advancements and widespread adoption in food computing.
- [827] arXiv:2308.15730 (replaced) [pdf, ps, other]
-
Title: Fully Embedded Time-Series Generative Adversarial NetworksComments: Final Manuscript. Accepted. Neural Computing and Applications May 2024Journal-ref: Neural Comput & Applic (2024)Subjects: Machine Learning (cs.LG)
Generative Adversarial Networks (GANs) should produce synthetic data that fits the underlying distribution of the data being modeled. For real valued time-series data, this implies the need to simultaneously capture the static distribution of the data, but also the full temporal distribution of the data for any potential time horizon. This temporal element produces a more complex problem that can potentially leave current solutions under-constrained, unstable during training, or prone to varying degrees of mode collapse. In FETSGAN, entire sequences are translated directly to the generator's sampling space using a seq2seq style adversarial auto encoder (AAE), where adversarial training is used to match the training distribution in both the feature space and the lower dimensional sampling space. This additional constraint provides a loose assurance that the temporal distribution of the synthetic samples will not collapse. In addition, the First Above Threshold (FAT) operator is introduced to supplement the reconstruction of encoded sequences, which improves training stability and the overall quality of the synthetic data being generated. These novel contributions demonstrate a significant improvement to the current state of the art for adversarial learners in qualitative measures of temporal similarity and quantitative predictive ability of data generated through FETSGAN.
- [828] arXiv:2309.04658 (replaced) [pdf, ps, html, other]
-
Title: Exploring Large Language Models for Communication Games: An Empirical Study on WerewolfComments: 23 pages, 5 figures and 4 tablesSubjects: Computation and Language (cs.CL)
Communication games, which we refer to as incomplete information games that heavily depend on natural language communication, hold significant research value in fields such as economics, social science, and artificial intelligence. In this work, we explore the problem of how to engage large language models (LLMs) in communication games, and in response, propose a tuning-free framework. Our approach keeps LLMs frozen, and relies on the retrieval and reflection on past communications and experiences for improvement. An empirical study on the representative and widely-studied communication game, ``Werewolf'', demonstrates that our framework can effectively play Werewolf game without tuning the parameters of the LLMs. More importantly, strategic behaviors begin to emerge in our experiments, suggesting that it will be a fruitful journey to engage LLMs in communication games and associated domains.
- [829] arXiv:2309.05967 (replaced) [pdf, ps, html, other]
-
Title: Optimal $L^2$ error estimates of mass- and energy-conserved FE schemes for a nonlinear Schr\"odinger-type systemSubjects: Numerical Analysis (math.NA)
In this paper, we present an implicit Crank-Nicolson finite element (FE) scheme for solving a nonlinear Schrödinger-type system, which includes Schrödinger-Helmholz system and Schrödinger-Poisson system. In our numerical scheme, we employ an implicit Crank-Nicolson method for time discretization and a conforming FE method for spatial discretization. The proposed method is proved to be well-posedness and ensures mass and energy conservation at the discrete level. Furthermore, we prove optimal $L^2$ error estimates for the fully discrete solutions. Finally, some numerical examples are provided to verify the convergence rate and conservation properties.
- [830] arXiv:2309.07597 (replaced) [pdf, ps, html, other]
-
Title: C-Pack: Packaged Resources To Advance General Chinese EmbeddingComments: Accepted by SIGIR 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive text embedding dataset curated from labeled and unlabeled Chinese corpora for training embedding models. 3) C-TEM is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by up to +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for C-TEM. Along with our resources on general Chinese embedding, we release our data and models for English text embeddings. The English models achieve state-of-the-art performance on MTEB benchmark; meanwhile, our released English data is 2 times larger than the Chinese data. All these resources are made publicly available at this https URL.
- [831] arXiv:2309.11876 (replaced) [pdf, ps, other]
-
Title: Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-trainingShuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Qiushi Ren, Zhaoheng Xie, Yanye LuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Because existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. And they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for volumetric medical image segmentation pre-training. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 12 volumetric medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. {\itshape i.e.} Our MACL achieves a superior performance with more precise predictions from visualization figures and 2.28\%, 1.32\%, 1.62\% and 1.60\% Average Dice higher than previous best results on CHD, MMWHS, CHAOS and AMOS, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones. Our code will be available at this https URL.
- [832] arXiv:2309.14236 (replaced) [pdf, ps, html, other]
-
Title: MoDem-V2: Visuo-Motor World Models for Real-World Robot ManipulationComments: 10 pages, 8 figuresSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Robotic systems that aspire to operate in uninstrumented real-world environments must perceive the world directly via onboard sensing. Vision-based learning systems aim to eliminate the need for environment instrumentation by building an implicit understanding of the world based on raw pixels, but navigating the contact-rich high-dimensional search space from solely sparse visual reward signals significantly exacerbates the challenge of exploration. The applicability of such systems is thus typically restricted to simulated or heavily engineered environments since agent exploration in the real-world without the guidance of explicit state estimation and dense rewards can lead to unsafe behavior and safety faults that are catastrophic. In this study, we isolate the root causes behind these limitations to develop a system, called MoDem-V2, capable of learning contact-rich manipulation directly in the uninstrumented real world. Building on the latest algorithmic advancements in model-based reinforcement learning (MBRL), demo-bootstrapping, and effective exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills directly in the real world. We identify key ingredients for leveraging demonstrations in model learning while respecting real-world safety considerations -- exploration centering, agency handover, and actor-critic ensembles. We empirically demonstrate the contribution of these ingredients in four complex visuo-motor manipulation problems in both simulation and the real world. To the best of our knowledge, our work presents the first successful system for demonstration-augmented visual MBRL trained directly in the real world. Visit this https URL for videos and more details.
- [833] arXiv:2309.16941 (replaced) [pdf, ps, html, other]
-
Title: G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural NetworksSubjects: Machine Learning (cs.LG)
Graph neural networks (GNNs) have recently emerged as a promising approach for solving the Boolean Satisfiability Problem (SAT), offering potential alternatives to traditional backtracking or local search SAT solvers. However, despite the growing volume of literature in this field, there remains a notable absence of a unified dataset and a fair benchmark to evaluate and compare existing approaches. To address this crucial gap, we present G4SATBench, the first benchmark study that establishes a comprehensive evaluation framework for GNN-based SAT solvers. In G4SATBench, we meticulously curate a large and diverse set of SAT datasets comprising 7 problems with 3 difficulty levels and benchmark a broad range of GNN models across various prediction tasks, training objectives, and inference algorithms. To explore the learning abilities and comprehend the strengths and limitations of GNN-based SAT solvers, we also compare their solving processes with the heuristics in search-based SAT solvers. Our empirical results provide valuable insights into the performance of GNN-based SAT solvers and further suggest that existing GNN models can effectively learn a solving strategy akin to greedy local search but struggle to learn backtracking search in the latent space. Our codebase is available at this https URL.
- [834] arXiv:2310.00982 (replaced) [pdf, ps, other]
-
Title: ViPlanner: Visual Semantic Imperative Learning for Local NavigationSubjects: Robotics (cs.RO)
Real-time path planning in outdoor environments still challenges modern robotic systems due to differences in terrain traversability, diverse obstacles, and the necessity for fast decision-making. Established approaches have primarily focused on geometric navigation solutions, which work well for structured geometric obstacles but have limitations regarding the semantic interpretation of different terrain types and their affordances. Moreover, these methods fail to identify traversable geometric occurrences, such as stairs. To overcome these issues, we introduce ViPlanner, a learned local path planning approach that generates local plans based on geometric and semantic information. The system is trained using the Imperative Learning paradigm, for which the network weights are optimized end-to-end based on the planning task objective. This optimization uses a differentiable formulation of a semantic costmap, which enables the planner to distinguish between the traversability of different terrains and accurately identify obstacles. The semantic information is represented in 30 classes using an RGB colorspace that can effectively encode the multiple levels of traversability. We show that the planner can adapt to diverse real-world environments without requiring any real-world training. In fact, the planner is trained purely in simulation, enabling a highly scalable training data generation. Experimental results demonstrate resistance to noise, zero-shot sim-to-real transfer, and a decrease of 38.02% in terms of traversability cost compared to purely geometric-based approaches. Code and models are made publicly available: this https URL.
- [835] arXiv:2310.02129 (replaced) [pdf, ps, html, other]
-
Title: Unveiling the Pitfalls of Knowledge Editing for Large Language ModelsComments: ICLR 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Machine Learning (cs.LG)
As the cost associated with fine-tuning Large Language Models (LLMs) continues to rise, recent research efforts have pivoted towards developing methodologies to edit implicit knowledge embedded within LLMs. Yet, there's still a dark cloud lingering overhead -- will knowledge editing trigger butterfly effect? since it is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. Our results underline two pivotal concerns: (1) Knowledge Conflict: Editing groups of facts that logically clash can magnify the inherent inconsistencies in LLMs-a facet neglected by previous methods. (2) Knowledge Distortion: Altering parameters with the aim of editing factual knowledge can irrevocably warp the innate knowledge structure of LLMs. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences on LLMs, which warrant attention and efforts for future works. Code and data are available at this https URL.
- [836] arXiv:2310.02554 (replaced) [pdf, ps, html, other]
-
Title: zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated LearningComments: Accepted by IEEE Transactions on Big DataSubjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Federated learning (FL) is a machine learning paradigm, which enables multiple and decentralized clients to collaboratively train a model under the orchestration of a central aggregator. FL can be a scalable machine learning solution in big data scenarios. Traditional FL relies on the trust assumption of the central aggregator, which forms cohorts of clients honestly. However, a malicious aggregator, in reality, could abandon and replace the client's training models, or insert fake clients, to manipulate the final training results. In this work, we introduce zkFL, which leverages zero-knowledge proofs to tackle the issue of a malicious aggregator during the training model aggregation process. To guarantee the correct aggregation results, the aggregator provides a proof per round, demonstrating to the clients that the aggregator executes the intended behavior faithfully. To further reduce the verification cost of clients, we use blockchain to handle the proof in a zero-knowledge way, where miners (i.e., the participants validating and maintaining the blockchain data) can verify the proof without knowing the clients' local and aggregated models. The theoretical analysis and empirical results show that zkFL achieves better security and privacy than traditional FL, without modifying the underlying FL network structure or heavily compromising the training speed.
- [837] arXiv:2310.02619 (replaced) [pdf, ps, other]
-
Title: Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEsComments: Accepted to The Twelfth International Conference on Learning Representations, ICLR 2024Subjects: Machine Learning (cs.LG)
Generating realistic time series data is important for many engineering and scientific applications. Existing work tackles this problem using generative adversarial networks (GANs). However, GANs are unstable during training, and they can suffer from mode collapse. While variational autoencoders (VAEs) are known to be more robust to the these issues, they are (surprisingly) less considered for time series generation. In this work, we introduce Koopman VAE (KoVAE), a new generative framework that is based on a novel design for the model prior, and that can be optimized for either regular and irregular training data. Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map. Our approach enhances generative modeling with two desired features: (i) incorporating domain knowledge can be achieved by leveraging spectral tools that prescribe constraints on the eigenvalues of the linear map; and (ii) studying the qualitative behavior and stability of the system can be performed using tools from dynamical systems theory. Our results show that KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks. Whether trained on regular or irregular data, KoVAE generates time series that improve both discriminative and predictive metrics. We also present visual evidence suggesting that KoVAE learns probability density functions that better approximate the empirical ground truth distribution.
- [838] arXiv:2310.03031 (replaced) [pdf, ps, other]
-
Title: How Prevalent is Gender Bias in ChatGPT? -- Exploring German and English ChatGPT ResponsesComments: Accepted @ "1st Workshop on Biased Data in Conversational Agents" (co-located with ECML PKDD 2023). This is the author's version of the work. The definite version of record will be published in the proceedingsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems' output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system's output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system's responses for biases as well as for syntactic and grammatical mistakes.
- [839] arXiv:2310.04826 (replaced) [pdf, ps, html, other]
-
Title: Augmenting Static Visualizations with PapARVis DesignerSubjects: Human-Computer Interaction (cs.HC)
This paper presents an authoring environment for augmenting static visualizations with virtual content in augmented reality. Augmenting static visualizations can leverage the best of both physical and digital worlds, but its creation currently involves different tools and devices, without any means to explicitly design and debug both static and virtual content simultaneously. To address these issues, we design an environment that seamlessly integrates all steps of a design and deployment workflow through its main features: i) an extension to Vega, ii) a preview, and iii) debug hints that facilitate valid combinations of static and augmented content. We inform our design through a design space with four ways to augment static visualizations. We demonstrate the expressiveness of our tool through examples, including books, posters, projections, wall-sized visualizations. A user study shows high user satisfaction of our environment and confirms that participants can create augmented visualizations in an average of 4.63 minutes.
- [840] arXiv:2310.04843 (replaced) [pdf, ps, html, other]
-
Title: MARVisT: Authoring Glyph-based Visualization in Mobile Augmented RealityJournal-ref: IEEE Transactions on Visualization and Computer Graphics ( Volume: 26, Issue: 8, 01 August 2020)Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
Recent advances in mobile augmented reality (AR) techniques have shed new light on personal visualization for their advantages of fitting visualization within personal routines, situating visualization in a real-world context, and arousing users' interests. However, enabling non-experts to create data visualization in mobile AR environments is challenging given the lack of tools that allow in-situ design while supporting the binding of data to AR content. Most existing AR authoring tools require working on personal computers or manually creating each virtual object and modifying its visual attributes. We systematically study this issue by identifying the specificity of AR glyph-based visualization authoring tool and distill four design considerations. Following these design considerations, we design and implement MARVisT, a mobile authoring tool that leverages information from reality to assist non-experts in addressing relationships between data and virtual glyphs, real objects and virtual glyphs, and real objects and data. With MARVisT, users without visualization expertise can bind data to real-world objects to create expressive AR glyph-based visualizations rapidly and effortlessly, reshaping the representation of the real world with data. We use several examples to demonstrate the expressiveness of MARVisT. A user study with non-experts is also conducted to evaluate the authoring experience of MARVisT.
- [841] arXiv:2310.07805 (replaced) [pdf, ps, other]
-
Title: Generative Modeling with Phase Stochastic BridgesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.
- [842] arXiv:2310.08529 (replaced) [pdf, ps, other]
-
Title: GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion ModelsTaoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang WangComments: CVPR 2024, Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can help generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine generation, but 3D consistency is hard to guarantee. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation. A fast 3D object generation framework, named as GaussianDreamer, is proposed, where the 3D diffusion model provides priors for initialization and the 2D diffusion model enriches the geometry and appearance. Operations of noisy point growing and color perturbation are introduced to enhance the initialized Gaussians. Our GaussianDreamer can generate a high-quality 3D instance or 3D avatar within 15 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time. Demos and code are available at this https URL.
- [843] arXiv:2310.08644 (replaced) [pdf, ps, other]
-
Title: A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific SystemsComments: 68 pages, 7 figures in the main text, 10 figures, and 10 tables in the supplementary materialsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicates their utility for enhancing scientific knowledge regarding system structure and function. Here, we propose a physically-interpretable Mass Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and ML-based modeling approaches. The MCP exploits the inherent isomorphism between the directed graph structures underlying both PC models and GRNNs to explicitly represent the mass-conserving nature of physical processes while enabling the functional nature of such processes to be directly learned (in an interpretable manner) from available data using off-the-shelf ML technology. As a proof of concept, we investigate the functional expressivity (capacity) of the MCP, explore its ability to parsimoniously represent the rainfall-runoff (RR) dynamics of the Leaf River Basin, and demonstrate its utility for scientific hypothesis testing. To conclude, we discuss extensions of the concept to enable ML-based physical-conceptual representation of the coupled nature of mass-energy-information flows through geoscientific systems.
- [844] arXiv:2310.09025 (replaced) [pdf, ps, other]
-
Title: Survey on Near-Space Information Networks: Channel Modeling, Networking, and Transmission PerspectivesSubjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Near-space information networks (NSINs) composed of high-altitude platforms (HAPs) and high- and low-altitude unmanned aerial vehicles (UAVs) are a new regime for providing quick, robust, and cost-efficient sensing and communication services. Precipitated by innovations and breakthroughs in manufacturing, materials, communications, electronics, and control techniques, NSINs have been envisioned as an essential component of the emerging sixth-generation of mobile communication systems. This article reveals some critical issues needing to be tackled in NSINs through conducting experiments and discusses the latest advances in NSINs in the research areas of channel modeling, networking, and transmission from a forward-looking, comparative, and technical evolutionary perspective. In this article, we highlight the characteristics of NSINs and present the promising use cases of NSINs. The impact of airborne platforms' unstable movements on the phase delays of onboard antenna arrays with diverse structures is mathematically analyzed. The recent advances in HAP channel modeling are elaborated on, along with the significant differences between HAP and UAV channel modeling. A comprehensive review of the networking techniques of NSINs in network deployment, handoff management, and network management aspects is provided. Besides, the promising techniques and communication protocols of the physical (PHY) layer, medium access control (MAC) layer, network layer, and transport layer of NSINs for achieving efficient transmission over NSINs are reviewed, and we have conducted experiments with practical NSINs to verify the performance of some techniques. Finally, we outline some open issues and promising directions for NSINs deserved for future study and discuss the corresponding challenges.
- [845] arXiv:2310.09656 (replaced) [pdf, ps, other]
-
Title: Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent SpaceHengrui Zhang, Jiani Zhang, Balasubramaniam Srinivasan, Zhengyuan Shen, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, George KarypisComments: Accepted by ICLR 2024 (Oral Presentation). Code is available at: this https URLSubjects: Machine Learning (cs.LG)
Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces Tabsyn, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted latent space. The key advantages of the proposed Tabsyn include (1) Generality: the ability to handle a broad spectrum of data types by converting them into a single unified space and explicitly capture inter-column relations; (2) Quality: optimizing the distribution of latent embeddings to enhance the subsequent training of diffusion models, which helps generate high-quality synthetic data, (3) Speed: much fewer number of reverse steps and faster synthesis speed than existing diffusion-based methods. Extensive experiments on six datasets with five metrics demonstrate that Tabsyn outperforms existing methods. Specifically, it reduces the error rates by 86% and 67% for column-wise distribution and pair-wise column correlation estimations compared with the most competitive baselines.
- [846] arXiv:2310.10108 (replaced) [pdf, ps, html, other]
-
Title: On Generative Agents in RecommendationComments: SIGIR 2024 perspective paperSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development. Addressing this challenge, we envision a recommendation simulator, capitalizing on recent breakthroughs in human-level intelligence exhibited by Large Language Models (LLMs). We propose Agent4Rec, a user simulator in recommendation, leveraging LLM-empowered generative agents equipped with user profile, memory, and actions modules specifically tailored for the recommender system. In particular, these agents' profile modules are initialized using real-world datasets (e.g. MovieLens, Steam, Amazon-Book), capturing users' unique tastes and social traits; memory modules log both factual and emotional memories and are integrated with an emotion-driven reflection mechanism; action modules support a wide variety of behaviors, spanning both taste-driven and emotion-driven actions. Each agent interacts with personalized recommender models in a page-by-page manner, relying on a pre-implemented collaborative filtering-based recommendation algorithm. We delve into both the capabilities and limitations of Agent4Rec, aiming to explore an essential research question: ``To what extent can LLM-empowered generative agents faithfully simulate the behavior of real, autonomous humans in recommender systems?'' Extensive and multi-faceted evaluations of Agent4Rec highlight both the alignment and deviation between agents and user-personalized preferences. Beyond mere performance comparison, we explore insightful experiments, such as emulating the filter bubble effect and discovering the underlying causal relationships in recommendation tasks. Our codes are available at this https URL.
- [847] arXiv:2310.11639 (replaced) [pdf, ps, html, other]
-
Title: CrossData: Leveraging Text-Data Connections for Authoring Data DocumentsSubjects: Human-Computer Interaction (cs.HC)
Data documents play a central role in recording, presenting, and disseminating data. Despite the proliferation of applications and systems designed to support the analysis, visualization, and communication of data, writing data documents remains a laborious process, requiring a constant back-and-forth between data processing and writing tools. Interviews with eight professionals revealed that their workflows contained numerous tedious, repetitive, and error-prone operations. The key issue that we identified is the lack of persistent connection between text and data. Thus, we developed CrossData, a prototype that treats text-data connections as persistent, interactive, first-class objects. By automatically identifying, establishing, and leveraging text-data connections, CrossData enables rich interactions to assist in the authoring of data documents. An expert evaluation with eight users demonstrated the usefulness of CrossData, showing that it not only reduced the manual effort in writing data documents but also opened new possibilities to bridge the gap between data exploration and writing.
- [848] arXiv:2310.12181 (replaced) [pdf, ps, other]
-
Title: Precise influence evaluation in complex networksSubjects: Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an)
Evaluating node influence is fundamental for identifying key nodes in complex networks. Existing methods typically rely on generic indicators to rank node influence across diverse networks, thereby ignoring the individualized features of each network itself. Actually, node influence stems not only from general features but the multi-scale individualized information encompassing specific network structure and task. Here we design an active learning architecture to predict node influence quantitively and precisely, which samples representative nodes based on graph entropy correlation matrix integrating multi-scale individualized information. This brings two intuitive advantages: (1) discovering potential high-influence but weak-connected nodes that are usually ignored in existing methods, (2) improving the influence maximization strategy by deducing influence interference. Significantly, our architecture demonstrates exceptional transfer learning capabilities across multiple types of networks, which can identify those key nodes with large disputation across different existing methods. Additionally, our approach, combined with a simple greedy algorithm, exhibits dominant performance in solving the influence maximization problem. This architecture holds great potential for applications in graph mining and prediction tasks.
- [849] arXiv:2310.15419 (replaced) [pdf, ps, other]
-
Title: Fast multiplication of random dense matrices with fixed sparse matricesSubjects: Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on-the-fly random number generation) to accelerate this operation. The techniques we propose decrease memory movement, thereby increasing the algorithm's parallel scalability in shared memory architectures. On the Intel Frontera architecture, our algorithm can achieve 2x speedups over libraries such as Eigen and Intel MKL on some examples. In addition, with 32 threads, we can obtain a parallel efficiency of up to approximately 45%. We also present a theoretical analysis for the memory movement lower bound of our algorithm, showing that under mild assumptions, it's possible to beat the data movement lower bound of general matrix-matrix multiply (GEMM) by a factor of $\sqrt M$, where $M$ is the cache size. Finally, we incorporate our sketching algorithm into a randomized least squares solver. For extremely over-determined sparse input matrices, we show that our results are competitive with SuiteSparse; in some cases, we obtain a speedup of 10x over SuiteSparse.
- [850] arXiv:2310.15469 (replaced) [pdf, ps, other]
-
Title: The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy RisksXiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, Zhikun Zhang, XiaoFeng Wang, Haixu TangSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
The rapid advancements of large language models (LLMs) have raised public concerns about the privacy leakage of personally identifiable information (PII) within their extensive training datasets. Recent studies have demonstrated that an adversary could extract highly sensitive privacy data from the training data of LLMs with carefully designed prompts. However, these attacks suffer from the model's tendency to hallucinate and catastrophic forgetting (CF) in the pre-training stage, rendering the veracity of divulged PIIs negligible. In our research, we propose a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in LLMs. We formalize the privacy leakage problem in LLMs and explain why forgotten PIIs can be recovered through empirical analysis on open-source language models. Based upon these insights, we evaluate the performance of Janus on both open-source language models and two latest LLMs, i.e., GPT-3.5-Turbo and LLaMA-2-7b. Our experiment results show that Janus amplifies the privacy risks by over 10 times in comparison with the baseline and significantly outperforms the state-of-the-art privacy extraction attacks including prefix attacks and in-context learning (ICL). Furthermore, our analysis validates that existing fine-tuning APIs provided by OpenAI and Azure AI Studio are susceptible to our Janus attack, allowing an adversary to conduct such an attack at a low cost.
- [851] arXiv:2310.15887 (replaced) [pdf, ps, other]
-
Title: AdaptiX -- A Transitional XR Framework for Development and Evaluation of Shared Control Applications in Assistive RoboticsComments: Proceedings of the ACM on Human-Computer Interaction (PACM HCI) as submission of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS'24)Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO); Software Engineering (cs.SE)
With the ongoing efforts to empower people with mobility impairments and the increase in technological acceptance by the general public, assistive technologies, such as collaborative robotic arms, are gaining popularity. Yet, their widespread success is limited by usability issues, specifically the disparity between user input and software control along the autonomy continuum. To address this, shared control concepts provide opportunities to combine the targeted increase of user autonomy with a certain level of computer assistance. This paper presents the free and open-source AdaptiX XR framework for developing and evaluating shared control applications in a high-resolution simulation environment. The initial framework consists of a simulated robotic arm with an example scenario in Virtual Reality (VR), multiple standard control interfaces, and a specialized recording/replay system. AdaptiX can easily be extended for specific research needs, allowing Human-Robot Interaction (HRI) researchers to rapidly design and test novel interaction methods, intervention strategies, and multi-modal feedback techniques, without requiring an actual physical robotic arm during the early phases of ideation, prototyping, and evaluation. Also, a Robot Operating System (ROS) integration enables the controlling of a real robotic arm in a PhysicalTwin approach without any simulation-reality gap. Here, we review the capabilities and limitations of AdaptiX in detail and present three bodies of research based on the framework. AdaptiX can be accessed at this https URL.
- [852] arXiv:2310.16181 (replaced) [pdf, ps, other]
-
Title: Hidden Citations Obscure True Impact in ScienceComments: 71 pages (main + SI)Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.
- [853] arXiv:2310.17327 (replaced) [pdf, ps, other]
-
Title: Near-Field Positioning and Attitude Sensing Based on Electromagnetic Propagation ModelingComments: 19 pages, 13 figures. Accepted by IEEE Journal on Selected Areas in CommunicationsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Positioning and sensing over wireless networks are imperative for many emerging applications. However, since traditional wireless channel models over-simplify the user equipment (UE) as a point target, they cannot be used for sensing the attitude of the UE, which is typically described by the spatial orientation. In this paper, a comprehensive electromagnetic propagation modeling (EPM) based on electromagnetic theory is developed to precisely model the near-field channel. For the noise-free case, the EPM model establishes the non-linear functional dependence of observed signals on both the position and attitude of the UE. To address the difficulty in the non-linear coupling, we first propose to divide the distance domain into three regions, separated by the defined Phase ambiguity distance and Spacing constraint distance. Then, for each region, we obtain the closed-form solutions for joint position and attitude estimation with low complexity. Next, to investigate the impact of random noise on the joint estimation performance, the Ziv-Zakai bound (ZZB) is derived to yield useful insights. The expected Cramér-Rao bound (ECRB) is further provided to obtain the simplified closed-form expressions for the performance lower bounds. Our numerical results demonstrate that the derived ZZB can provide accurate predictions of the performance of estimators in all signal-to-noise ratio (SNR) regimes. More importantly, we achieve the millimeter-level accuracy in position estimation and attain the 0.1-level accuracy in attitude estimation.
- [854] arXiv:2310.17347 (replaced) [pdf, ps, other]
-
Title: CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed SamplingComments: Published as a conference paper at ICLR 2024Journal-ref: The Twelfth International Conference on Learning Representations (ICLR 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffusion models that can increase generation diversity, especially at high guidance scales, with minimal loss of sample quality. Our sampling strategy anneals the conditioning signal by adding scheduled, monotonically decreasing Gaussian noise to the conditioning vector during inference to balance diversity and condition alignment. Our Condition-Annealed Diffusion Sampler (CADS) can be used with any pretrained model and sampling algorithm, and we show that it boosts the diversity of diffusion models in various conditional generation tasks. Further, using an existing pretrained diffusion model, CADS achieves a new state-of-the-art FID of 1.70 and 2.31 for class-conditional ImageNet generation at 256$\times$256 and 512$\times$512 respectively.
- [855] arXiv:2310.17868 (replaced) [pdf, ps, other]
-
Title: Resource Allocation for Near-Field Communications: Fundamentals, Tools, and OutlooksSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Extremely large-scale multiple-input-multiple output (XL-MIMO) is a promising technology to achieve high spectral efficiency (SE) and energy efficiency (EE) in future wireless systems. The larger array aperture of XL-MIMO makes communication scenarios closer to the near-field region. Therefore, near-field resource allocation is essential in realizing the above key performance indicators (KPIs). Moreover, the overall performance of XL-MIMO systems heavily depends on the channel characteristics of the selected users, eliminating interference between users through beamforming, power control, etc. The above resource allocation issue constitutes a complex joint multi-objective optimization problem since many variables and parameters must be optimized, including the spatial degree of freedom, rate, power allocation, and transmission technique. In this article, we review the basic properties of near-field communications and focus on the corresponding "resource allocation" problems. First, we identify available resources in near-field communication systems and highlight their distinctions from far-field communications. Then, we summarize optimization tools, such as numerical techniques and machine learning methods, for addressing near-field resource allocation, emphasizing their strengths and limitations. Finally, several important research directions of near-field communications are pointed out for further investigation.
- [856] arXiv:2310.17894 (replaced) [pdf, ps, other]
-
Title: Natural Language Interfaces for Tabular Data Querying and Visualization: A SurveyWeixu Zhang, Yifei Wang, Yuanfeng Song, Victor Junqiu Wei, Yuxing Tian, Yiyan Qi, Jonathan H. Chan, Raymond Chi-Wing Wong, Haiqin YangComments: 20 pages, 4 figures, 5 tables. Accepted by IEEE TKDESubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.
- [857] arXiv:2310.20228 (replaced) [pdf, ps, html, other]
-
Title: Reconstructing Human Pose from Inertial Measurements: A Generative Model-based Compressive Sensing ApproachSubjects: Human-Computer Interaction (cs.HC)
The ability to sense, localize, and estimate the 3D position and orientation of the human body is critical in virtual reality (VR) and extended reality (XR) applications. This becomes more important and challenging with the deployment of VR/XR applications over the next generation of wireless systems such as 5G and beyond. In this paper, we propose a novel framework that can reconstruct the 3D human body pose of the user given sparse measurements from Inertial Measurement Unit (IMU) sensors over a noisy wireless environment. Specifically, our framework enables reliable transmission of compressed IMU signals through noisy wireless channels and effective recovery of such signals at the receiver, e.g., an edge server. This task is very challenging due to the constraints of transmit power, recovery accuracy, and recovery latency. To address these challenges, we first develop a deep generative model at the receiver to recover the data from linear measurements of IMU signals. The linear measurements of the IMU signals are obtained by a linear projection with a measurement matrix based on the compressive sensing theory. The key to the success of our framework lies in the novel design of the measurement matrix at the transmitter, which can not only satisfy power constraints for the IMU devices but also obtain a highly accurate recovery for the IMU signals at the receiver. This can be achieved by extending the set-restricted eigenvalue condition of the measurement matrix and combining it with an upper bound for the power transmission constraint. Our framework can achieve robust performance for recovering 3D human poses from noisy compressed IMU signals. Additionally, our pre-trained deep generative model achieves signal reconstruction accuracy comparable to an optimization-based approach, i.e., Lasso, but is an order of magnitude faster.
- [858] arXiv:2311.00405 (replaced) [pdf, ps, other]
-
Title: Couples can be tractable: New algorithms and hardness results for the Hospitals / Residents problem with CouplesComments: To appear in the Proceedings of IJCAI 2024: the 33rd International Joint Conference on Artificial IntelligenceSubjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
In this paper we study the {\sc Hospitals / Residents problem with Couples} ({\sc hrc}), where a solution is a stable matching or a report that none exists. We present a novel polynomial-time algorithm that can find a near-feasible stable matching (adjusting the hospitals' capacities by at most 1) in an {\sc hrc} instance where the couples' preferences are sub-responsive (i.e., if one member switches to a better hospital, than the couple also improves) and sub-complete (i.e., each pair of hospitals that are individually acceptable to both members are jointly acceptable for the couple) by reducing it to an instance of the {\sc Stable Fixtures} problem. We also present a polynomial-time algorithm for {\sc hrc} in a sub-responsive, sub-complete instance that is a Dual Market, or where all couples are one of several possible types. We show that our algorithm also implies the polynomial-time solvability of a stable b-matching problem, where the underlying graph is a multigraph with loops.
We complement our algorithms with several hardness results. We show that {\sc hrc} with sub-responsive and sub-complete couples is NP-hard, even with other strong restrictions. We also show that {\sc hrc} with a Dual Market is NP-hard under several simultaneous restrictions. Finally, we show that the problem of finding a matching with the minimum number of blocking pairs in {\sc hrc} is not approximable within $m^{1-\varepsilon}$, for any $\varepsilon>0$, where $m$ is the total length of the hospitals' preference lists, unless P=NP, even if each couple applies to only one pair of hospitals. Our polynomial-time solvability results greatly expand the class of known tractable instances of {\sc hrc} and provide additional evidence as to why long-standing entry-level labour markets that allow couples such as the National Resident Matching Program remain successful to this day. - [859] arXiv:2311.00960 (replaced) [pdf, ps, other]
-
Title: Trajectory Similarity Measurement: An Efficiency PerspectiveComments: Accepted by VLDB 2024Subjects: Databases (cs.DB)
Trajectories that capture object movement have numerous applications, in which similarity computation between trajectories often plays a key role. Traditionally, the similarity between two trajectories is quantified by means of heuristic measures, e.g., Hausdorff or ERP, that operate directly on the trajectories. In contrast, recent studies exploit deep learning to map trajectories to d-dimensional vectors, called embeddings. Then, some distance measure, e.g., Manhattan or Euclidean, is applied to the embeddings to quantify trajectory similarity. The resulting similarities are inaccurate: they only approximate the similarities obtained using the heuristic measures. As distance computation on embeddings is efficient, focus has been on achieving embeddings yielding high accuracy.
Adopting an efficiency perspective, we analyze the time complexities of both the heuristic and the learning-based approaches, finding that the time complexities of the former approaches are not necessarily higher. Through extensive experiments on open datasets, we find that, on both CPUs and GPUs, only a few learning-based approaches can deliver the promised higher efficiency, when the embeddings can be pre-computed, while heuristic approaches are more efficient for one-off computations. Among the learning-based approaches, the self-attention-based ones are the fastest to learn embeddings that also yield the highest accuracy for similarity queries. These results have implications for the use of trajectory similarity approaches given different application requirements. - [860] arXiv:2311.01026 (replaced) [pdf, ps, other]
-
Title: A Constant Factor Approximation for Directed Feedback Vertex Set in Graphs of Bounded GenusSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
The minimum directed feedback vertex set problem consists in finding the minimum set of vertices that should be removed in order to make a directed graph acyclic. This is a well-known NP-hard optimization problem with applications in various fields, such as VLSI chip design, bioinformatics and transaction processing deadlock prevention and node-weighted network design. We show a constant factor approximation for the directed feedback vertex set problem in graphs of bounded genus.
- [861] arXiv:2311.01598 (replaced) [pdf, ps, other]
-
Title: CiFlow: Dataflow Analysis and Optimization of Key Switching for Homomorphic EncryptionSubjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR); Performance (cs.PF)
Homomorphic encryption (HE) is a privacy-preserving computation technique that enables computation on encrypted data. Today, the potential of HE remains largely unrealized as it is impractically slow, preventing it from being used in real applications. A major computational bottleneck in HE is the key-switching operation, accounting for approximately 70% of the overall HE execution time and involving a large amount of data for inputs, intermediates, and keys. Prior research has focused on hardware accelerators to improve HE performance, typically featuring large on-chip SRAMs and high off-chip bandwidth to deal with large scale data. In this paper, we present a novel approach to improve key-switching performance by rigorously analyzing its dataflow. Our primary goal is to optimize data reuse with limited on-chip memory to minimize off-chip data movement. We introduce three distinct dataflows: Max-Parallel (MP), Digit-Centric (DC), and Output-Centric (OC), each with unique scheduling approaches for key-switching computations. Through our analysis, we show how our proposed Output-Centric technique can effectively reuse data by significantly lowering the intermediate key-switching working set and alleviating the need for massive off-chip bandwidth. We thoroughly evaluate the three dataflows using the RPU, a recently published vector processor tailored for ring processing algorithms, which includes HE. This evaluation considers sweeps of bandwidth and computational throughput, and whether keys are buffered on-chip or streamed. With OC, we demonstrate up to 4.16x speedup over the MP dataflow and show how OC can save 12.25x on-chip SRAM by streaming keys for minimal performance penalty.
- [862] arXiv:2311.03410 (replaced) [pdf, ps, other]
-
Title: DP-DCAN: Differentially Private Deep Contrastive Autoencoder Network for Single-cell ClusteringSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Genomics (q-bio.GN)
Single-cell RNA sequencing (scRNA-seq) is important to transcriptomic analysis of gene expression. Recently, deep learning has facilitated the analysis of high-dimensional single-cell data. Unfortunately, deep learning models may leak sensitive information about users. As a result, Differential Privacy (DP) is increasingly used to protect privacy. However, existing DP methods usually perturb whole neural networks to achieve differential privacy, and hence result in great performance overheads. To address this challenge, in this paper, we take advantage of the uniqueness of the autoencoder that it outputs only the dimension-reduced vector in the middle of the network, and design a Differentially Private Deep Contrastive Autoencoder Network (DP-DCAN) by partial network perturbation for single-cell clustering. Since only partial network is added with noise, the performance improvement is obvious and twofold: one part of network is trained with less noise due to a bigger privacy budget, and the other part is trained without any noise. Experimental results of six datasets have verified that DP-DCAN is superior to the traditional DP scheme with whole network perturbation. Moreover, DP-DCAN demonstrates strong robustness to adversarial attacks.
- [863] arXiv:2311.03783 (replaced) [pdf, ps, other]
-
Title: Scene-Driven Multimodal Knowledge Graph Construction for Embodied AIComments: 14 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI); Robotics (cs.RO); Symbolic Computation (cs.SC)
Embodied AI is one of the most popular studies in artificial intelligence and robotics, which can effectively improve the intelligence of real-world agents (i.e. robots) serving human beings. Scene knowledge is important for an agent to understand the surroundings and make correct decisions in the varied open world. Currently, knowledge base for embodied tasks is missing and most existing work use general knowledge base or pre-trained models to enhance the intelligence of an agent. For conventional knowledge base, it is sparse, insufficient in capacity and cost in data collection. For pre-trained models, they face the uncertainty of knowledge and hard maintenance. To overcome the challenges of scene knowledge, we propose a scene-driven multimodal knowledge graph (Scene-MMKG) construction method combining conventional knowledge engineering and large language models. A unified scene knowledge injection framework is introduced for knowledge representation. To evaluate the advantages of our proposed method, we instantiate Scene-MMKG considering typical indoor robotic functionalities (Manipulation and Mobility), named ManipMob-MMKG. Comparisons in characteristics indicate our instantiated ManipMob-MMKG has broad superiority in data-collection efficiency and knowledge quality. Experimental results on typical embodied tasks show that knowledge-enhanced methods using our instantiated ManipMob-MMKG can improve the performance obviously without re-designing model structures complexly. Our project can be found at this https URL
- [864] arXiv:2311.04451 (replaced) [pdf, ps, html, other]
-
Title: Pseduo-Random and de Bruijn Array CodesSubjects: Information Theory (cs.IT)
Pseudo-random arrays and perfect maps are the two-dimensional analogs of M-sequences and de Bruijn sequences, respectively. We modify the definitions to be applied to codes. These codes are also the two-dimensional analogs of certain factors in the de Bruijn graph. These factors are called zero factors and perfect factors in the de Bruijn graph. We apply a folding technique to construct pseudo-random array codes and examine the minimum distance of the constructed codes. The folding is applied on sequences generated from irreducible polynomials or a product of irreducible polynomials with the same degree and the same exponent. Direct and recursive constructions for de Bruijn array codes are presented and discussed.
- [865] arXiv:2311.05600 (replaced) [pdf, ps, other]
-
Title: FogROS2-Config: Optimizing Latency and Cost for Multi-Cloud Robot ApplicationsKaiyuan Chen, Kush Hari, Rohil Khare, Charlotte Le, Trinity Chung, Jaimyn Drake, Jeffrey Ichnowski, John Kubiatowicz, Ken GoldbergComments: Published 2024 IEEE International Conference on Robotics and Automation (ICRA), Former name: FogROS2-SkySubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Cloud service providers provide over 50,000 distinct and dynamically changing set of cloud server options. To help roboticists make cost-effective decisions, we present FogROS2-Config, an open toolkit that takes ROS2 nodes as input and automatically runs relevant benchmarks to quickly return a menu of cloud compute services that tradeoff latency and cost. Because it is infeasible to try every hardware configuration, FogROS2-Config quickly samples tests a small set of edge case servers. We evaluate FogROS2-Config on three robotics application tasks: visual SLAM, grasp planning. and motion planning. FogROS2-Config can reduce the cost by up to 20x. By comparing with a Pareto frontier for cost and latency by running the application task on feasible server configurations, we evaluate cost and latency models and confirm that FogROS2-Config selects efficient hardware configurations to balance cost and latency.
- [866] arXiv:2311.08325 (replaced) [pdf, ps, html, other]
-
Title: Protecting the Future of Information: LOCO Coding With Error Detection for DNA Data StorageComments: 17 pages (double column), 3 figures, accepted at the IEEE Transactions on Molecular, Biological and Multi-scale Communications (TMBMC)Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
DNA strands serve as a storage medium for $4$-ary data over the alphabet $\{A,T,G,C\}$. DNA data storage promises formidable information density, long-term durability, and ease of replicability. However, information in this intriguing storage technology might be corrupted. Experiments have revealed that DNA sequences with long homopolymers and/or with low $GC$-content are notably more subject to errors upon storage.
This paper investigates the utilization of the recently-introduced method for designing lexicographically-ordered constrained (LOCO) codes in DNA data storage. This paper introduces DNA LOCO (D-LOCO) codes, over the alphabet $\{A,T,G,C\}$ with limited runs of identical symbols. These codes come with an encoding-decoding rule we derive, which provides affordable encoding-decoding algorithms. In terms of storage overhead, the proposed encoding-decoding algorithms outperform those in the existing literature. Our algorithms are readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows us to achieve balancing over the entire DNA strand with minimal rate penalty. Moreover, we propose four schemes to bridge consecutive codewords, three of which guarantee single substitution error detection per codeword. We examine the probability of undetecting errors. We also show that D-LOCO codes are capacity-achieving and that they offer remarkably high rates at moderate lengths. - [867] arXiv:2311.08473 (replaced) [pdf, ps, other]
-
Title: Real-time topology optimization via learnable mappingsSubjects: Computational Engineering, Finance, and Science (cs.CE)
In traditional topology optimization, the computing time required to iteratively update the material distribution within a design domain strongly depends on the complexity or size of the problem, limiting its application in real engineering contexts. This work proposes a multi-stage machine learning strategy that aims to predict an optimal topology and the related stress fields of interest, either in 2D or 3D, without resorting to any iterative analysis and design process. The overall topology optimization is treated as regression task in a low-dimensional latent space, that encodes the variability of the target designs. First, a fully-connected model is employed to surrogate the functional link between the parametric input space characterizing the design problem and the latent space representation of the corresponding optimal topology. The decoder branch of an autoencoder is then exploited to reconstruct the desired optimal topology from its latent representation. The deep learning models are trained on a dataset generated through a standard method of topology optimization implementing the solid isotropic material with penalization, for varying boundary and loading conditions. The underlying hypothesis behind the proposed strategy is that optimal topologies share enough common patterns to be compressed into small latent space representations without significant information loss. Results relevant to a 2D Messerschmitt-Bölkow-Blohm beam and a 3D bridge case demonstrate the capabilities of the proposed framework to provide accurate optimal topology predictions in a fraction of a second.
- [868] arXiv:2311.08605 (replaced) [pdf, ps, other]
-
Title: Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency AnalysisSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding the prevalence of bias in these models and its mitigation. Yet, as exemplified by both results on debiasing methods in the literature and reports of alignment-related defects from the wider community, bias remains a poorly understood topic despite its practical relevance. To enhance the understanding of the internal causes of bias, we analyse LLM bias through the lens of causal fairness analysis, which enables us to both comprehend the origins of bias and reason about its downstream consequences and mitigation. To operationalize this framework, we propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the LLM decision process. By applying Activity Dependency Networks (ADNs), we then analyse how these attributes influence an LLM's decision process. We apply our method to LLM ratings of argument quality in political debates. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment, and discuss the consequences of our findings for human-AI alignment and bias mitigation. Our code and data are at this https URL.
- [869] arXiv:2311.09379 (replaced) [pdf, ps, other]
-
Title: A Characteristic Mapping Method for Vlasov-Poisson with Extreme Resolution PropertiesComments: Code is available at this https URLJournal-ref: Commun. Comput. Phys., 35 (2024), pp. 905-937Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Plasma Physics (physics.plasm-ph)
We propose an efficient semi-Lagrangian characteristic mapping method for solving the one+one-dimensional Vlasov-Poisson equations with high precision on a coarse grid. The flow map is evolved numerically and exponential resolution in linear time is obtained. Global third-order convergence in space and time is shown and conservation properties are assessed. For benchmarking, we consider linear and nonlinear Landau damping and the two-stream instability. We compare the results with a Fourier pseudo-spectral method. The extreme fine-scale resolution features are illustrated showing the method's capabilities to efficiently treat filamentation in fusion plasma simulations.
- [870] arXiv:2311.10365 (replaced) [pdf, ps, other]
-
Title: Dates Fruit Disease Recognition using Machine LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Many countries such as Saudi Arabia, Morocco and Tunisia are among the top exporters and consumers of palm date fruits. Date fruit production plays a major role in the economies of the date fruit exporting countries. Date fruits are susceptible to disease just like any fruit and early detection and intervention can end up saving the produce. However, with the vast farming lands, it is nearly impossible for farmers to observe date trees on a frequent basis for early disease detection. In addition, even with human observation the process is prone to human error and increases the date fruit cost. With the recent advances in computer vision, machine learning, drone technology, and other technologies; an integrated solution can be proposed for the automatic detection of date fruit disease. In this paper, a hybrid features based method with the standard classifiers is proposed based on the extraction of L*a*b color features, statistical features, and Discrete Wavelet Transform (DWT) texture features for the early detection and classification of date fruit disease. A dataset was developed for this work consisting of 871 images divided into the following classes; Healthy date, Initial stage of disease, Malnourished date, and Parasite infected. The extracted features were input to common classifiers such as the Random Forest (RF), Multilayer Perceptron (MLP), Naïve Bayes (NB), and Fuzzy Decision Trees (FDT). The highest average accuracy was achieved when combining the L*a*b, Statistical, and DWT Features.
- [871] arXiv:2311.14477 (replaced) [pdf, ps, other]
-
Title: Simulation Limitations of Affine Cellular AutomataJournal-ref: Theoretical Computer Science, Volume 1003, 2024, ISSN 0304-3975Subjects: Formal Languages and Automata Theory (cs.FL)
Cellular automata are a famous model of computation, yet it is still a challenging task to assess the computational capacity of a given automaton; especially when it comes to showing negative results. In this paper, we focus on studying this problem via the notion of CA relative simulation. We say that automaton A is simulated by B if each space-time diagram of A can be, after suitable transformations, reproduced by B.
We study affine automata - i.e., automata whose local rules are affine mappings of vector spaces. This broad class contains the well-studied cases of additive automata. The main result of this paper shows that (almost) every automaton affine over a finite field F_p can only simulate affine automata over F_p. We discuss how this general result implies, and widely surpasses, limitations of additive automata previously proved in the literature.
We provide a formalization of the simulation notions into algebraic language and discuss how this opens a new path to showing negative results about the computational power of cellular automata using deeper algebraic theorems. - [872] arXiv:2311.15502 (replaced) [pdf, ps, other]
-
Title: Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More PracticalComments: ICML 2024Subjects: Machine Learning (cs.LG)
Complementary-label learning is a weakly supervised learning problem in which each training example is associated with one or multiple complementary labels indicating the classes to which it does not belong. Existing consistent approaches have relied on the uniform distribution assumption to model the generation of complementary labels, or on an ordinary-label training set to estimate the transition matrix in non-uniform cases. However, either condition may not be satisfied in real-world scenarios. In this paper, we propose a novel consistent approach that does not rely on these conditions. Inspired by the positive-unlabeled (PU) learning literature, we propose an unbiased risk estimator based on the Selected-Completely-at-Random assumption for complementary-label learning. We then introduce a risk-correction approach to address overfitting problems. Furthermore, we find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems when using the one-versus-rest strategy. Extensive experimental results on both synthetic and real-world benchmark datasets validate the superiority of our proposed approach over state-of-the-art methods.
- [873] arXiv:2311.15756 (replaced) [pdf, ps, other]
-
Title: Learning Multi-Frequency Partial Correlation GraphsComments: Accepted at IEEE Transactions on Signal ProcessingSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial correlation graph, in which layers correspond to different frequency bands, and partial correlations can occur over just a few layers. To this aim, we formulate and solve two nonconvex learning problems: the first has a closed-form solution and is suitable when there is prior knowledge about the number of partial correlations; the second hinges on an iterative solution based on successive convex approximation, and is effective for the general case where no prior knowledge is available. Numerical results on synthetic data show that the proposed methods outperform the current state of the art. Finally, the analysis of financial time series confirms that partial correlations exist only within a few frequency bands, underscoring how our methods enable the gaining of valuable insights that would be undetected without discriminating along the frequency domain.
- [874] arXiv:2311.16637 (replaced) [pdf, ps, other]
-
Title: Parallax-Tolerant Image Stitching with Epipolar Displacement FieldSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image stitching with parallax is still a challenging task. Existing methods often struggle to maintain both the local and global structures of the image while reducing alignment artifacts and warping distortions. In this paper, we propose a novel approach that utilizes epipolar geometry to establish a warping technique based on the epipolar displacement field. Initially, the warping rule for pixels in the epipolar geometry is established through the infinite homography. Subsequently, the epipolar displacement field, which represents the sliding distance of the warped pixel along the epipolar line, is formulated by thin-plate splines based on the principle of local elastic deformation. The stitching result can be generated by inversely warping the pixels according to the epipolar displacement field. This method incorporates the epipolar constraints in the warping rule, which ensures high-quality alignment and maintains the projectivity of the panorama. Qualitative and quantitative comparative experiments demonstrate the competitiveness of the proposed method for stitching images with large parallax.
- [875] arXiv:2311.16639 (replaced) [pdf, ps, other]
-
Title: Scaling Political Texts with Large Language Models: Asking a Chatbot Might Be All You NeedSubjects: Computation and Language (cs.CL)
We use instruction-tuned Large Language Models (LLMs) such as GPT-4, MiXtral, and Llama 3 to position political texts within policy and ideological spaces. We directly ask the LLMs where a text document or its author stand on the focal policy dimension. We illustrate and validate the approach by scaling British party manifestos on the economic, social, and immigration policy dimensions; speeches from a European Parliament debate in 10 languages on the anti- to pro-subsidy dimension; Senators of the 117th US Congress based on their tweets on the left-right ideological spectrum; and tweets published by US Representatives and Senators after the training cutoff date of GPT-4. The correlation between the position estimates obtained with the best LLMs and benchmarks based on coding by experts, crowdworkers or roll call votes exceeds .90. This training-free approach also outperforms supervised classifiers trained on large amounts of data. Using instruction-tuned LLMs to scale texts in policy and ideological spaces is fast, cost-efficient, reliable, and reproducible (in the case of open LLMs) even if the texts are short and written in different languages. We conclude with cautionary notes about the need for empirical validation.
- [876] arXiv:2311.16835 (replaced) [pdf, ps, other]
-
Title: Unified-modal Salient Object Detection via Adaptive Prompt LearningComments: 13 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Existing single-modal and multi-modal salient object detection (SOD) methods focus on designing specific architectures tailored for their respective tasks. However, developing completely different models for different tasks leads to labor and time consumption, as well as high computational and practical deployment costs. In this paper, we attempt to address both single-modal and multi-modal SOD in a unified framework called UniSOD, which fully exploits the overlapping prior knowledge between different tasks. Nevertheless, assigning appropriate strategies to modality variable inputs is challenging. To this end, UniSOD learns modality-aware prompts with task-specific hints through adaptive prompt learning, which are plugged into the proposed pre-trained baseline SOD model to handle corresponding tasks, while only requiring few learnable parameters compared to training the entire model. Each modality-aware prompt is generated from a switchable prompt generation block, which adaptively performs structural switching based on single-modal and multi-modal inputs without human intervention. Through end-to-end joint training, UniSOD achieves overall performance improvement on 14 benchmark datasets for RGB, RGB-D, and RGB-T SOD, which demonstrates that our method effectively and efficiently unifies single-modal and multi-modal SOD tasks.
- [877] arXiv:2311.17279 (replaced) [pdf, ps, html, other]
-
Title: LiveTune: Dynamic Parameter Tuning for Feedback-Driven OptimizationSubjects: Machine Learning (cs.LG); Performance (cs.PF)
Feedback-driven optimization, such as traditional machine learning training, is a static process that lacks real-time adaptability of hyperparameters. Tuning solutions for optimization require trial and error paired with checkpointing and schedulers, in many cases feedback from the algorithm is overlooked. Adjusting hyperparameters during optimization usually requires the program to be restarted, wasting utilization and time, while placing unnecessary strain on memory and processors. We present LiveTune, a novel framework allowing real-time parameter adjustment of optimization loops through LiveVariables. Live Variables allow for continuous feedback-driven optimization by storing parameters on designated ports on the system, allowing them to be dynamically adjusted. Extensive evaluations of our framework on standard machine learning training pipelines show saving up to 60 seconds and 5.4 Kilojoules of energy per hyperparameter change. We also show the feasibility and value of LiveTune in a reinforcement learning application where the users change the dynamics of the reward structure while the agent is learning showing 5x improvement over the baseline. Finally, we outline a fully automated workflow to provide end-to-end, unsupervised feedback-driven optimization.
- [878] arXiv:2312.00206 (replaced) [pdf, ps, html, other]
-
Title: SparseGS: Real-Time 360{\deg} Sparse View Synthesis using Gaussian SplattingComments: This is a revised version which includes multiple new components. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360-degree scenes from sparse training views. We integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by 6.4% in LPIPS and by 12.2% in PSNR, and NeRF-based methods by at least 17.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost.
- [879] arXiv:2312.02419 (replaced) [pdf, ps, other]
-
Title: Human Demonstrations are Generalizable Knowledge for RobotsTe Cui, Guangyan Chen, Tianxing Zhou, Zicai Peng, Mengxiao Hu, Haoyang Lu, Haizhou Li, Meiling Wang, Yi Yang, Yufeng YueSubjects: Robotics (cs.RO)
Learning from human demonstrations is an emerging trend for designing intelligent robotic systems. However, previous methods typically regard videos as instructions, simply dividing them into action sequences for robotic repetition, which poses obstacles to generalization to diverse tasks or object instances. In this paper, we propose a different perspective, considering human demonstration videos not as mere instructions, but as a source of knowledge for robots. Motivated by this perspective and the remarkable comprehension and generalization capabilities exhibited by large language models (LLMs), we propose DigKnow, a method that DIstills Generalizable KNOWledge with a hierarchical structure. Specifically, DigKnow begins by converting human demonstration video frames into observation knowledge. This knowledge is then subjected to analysis to extract human action knowledge and further distilled into pattern knowledge compassing task and object instances, resulting in the acquisition of generalizable knowledge with a hierarchical structure. In settings with different tasks or object instances, DigKnow retrieves relevant knowledge for the current task and object instances. Subsequently, the LLM-based planner conducts planning based on the retrieved knowledge, and the policy executes actions in line with the plan to achieve the designated task. Utilizing the retrieved knowledge, we validate and rectify planning and execution outcomes, resulting in a substantial enhancement of the success rate. Experimental results across a range of tasks and scenes demonstrate the effectiveness of this approach in facilitating real-world robots to accomplish tasks with the knowledge derived from human demonstrations.
- [880] arXiv:2312.05473 (replaced) [pdf, ps, other]
-
Title: Self Model for Embodied Intelligence: Modeling Full-Body Human Musculoskeletal System and Locomotion Control with Hierarchical Low-Dimensional RepresentationComments: ICRA 2024Subjects: Artificial Intelligence (cs.AI)
Modeling and control of the human musculoskeletal system is important for understanding human motor functions, developing embodied intelligence, and optimizing human-robot interaction systems. However, current human musculoskeletal models are restricted to a limited range of body parts and often with a reduced number of muscles. There is also a lack of algorithms capable of controlling over 600 muscles to generate reasonable human movements. To fill this gap, we build a musculoskeletal model (MS-Human-700) with 90 body segments, 206 joints, and 700 muscle-tendon units, allowing simulation of full-body dynamics and interaction with various devices. We develop a new algorithm using low-dimensional representation and hierarchical deep reinforcement learning to achieve state-of-the-art full-body control. We validate the effectiveness of our model and algorithm in simulations with real human locomotion data. The musculoskeletal model, along with its control algorithm, will be made available to the research community to promote a deeper understanding of human motion control and better design of interactive robots.
Project page: this https URL - [881] arXiv:2312.05601 (replaced) [pdf, ps, other]
-
Title: A Meshless Solver for Blood Flow Simulations in Elastic Vessels Using Physics-Informed Neural NetworkSubjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
Investigating blood flow in the cardiovascular system is crucial for assessing cardiovascular health. Computational approaches offer some non-invasive alternatives to measure blood flow dynamics. Numerical simulations based on traditional methods such as finite-element and other numerical discretizations have been extensively studied and have yielded excellent results. However, adapting these methods to real-life simulations remains a complex task. In this paper, we propose a method that offers flexibility and can efficiently handle real-life simulations. We suggest utilizing the physics-informed neural network (PINN) to solve the Navier-Stokes equation in a deformable domain, specifically addressing the simulation of blood flow in elastic vessels. Our approach models blood flow using an incompressible, viscous Navier-Stokes equation in an Arbitrary Lagrangian-Eulerian form. The mechanical model for the vessel wall structure is formulated by an equation of Newton's second law of momentum and linear elasticity to the force exerted by the fluid flow. Our method is a mesh-free approach that eliminates the need for discretization and meshing of the computational domain. This makes it highly efficient in solving simulations involving complex geometries. Additionally, with the availability of well-developed open-source machine learning framework packages and parallel modules, our method can easily be accelerated through GPU computing and parallel computing. To evaluate our approach, we conducted experiments on regular cylinder vessels as well as vessels with plaque on their walls. We compared our results to a solution calculated by Finite Element Methods using a dense grid and small time steps, which we considered as the ground truth solution. We report the relative error and the time consumed to solve the problem, highlighting the advantages of our method.
- [882] arXiv:2312.07553 (replaced) [pdf, ps, other]
-
Title: Hijacking Context in Large Multi-modal ModelsComments: Technical Report. PreprintJournal-ref: ICLR 2024 Workshop on Reliable and Responsible Foundation ModelsSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Recently, Large Multi-modal Models (LMMs) have demonstrated their ability to understand the visual contents of images given the instructions regarding the images. Built upon the Large Language Models (LLMs), LMMs also inherit their abilities and characteristics such as in-context learning where a coherent sequence of images and texts are given as the input prompt. However, we identify a new limitation of off-the-shelf LMMs where a small fraction of incoherent images or text descriptions mislead LMMs to only generate biased output about the hijacked context, not the originally intended context. To address this, we propose a pre-filtering method that removes irrelevant contexts via GPT-4V, based on its robustness towards distribution shift within the contexts. We further investigate whether replacing the hijacked visual and textual contexts with the correlated ones via GPT-4V and text-to-image models can help yield coherent responses.
- [883] arXiv:2312.07804 (replaced) [pdf, ps, html, other]
-
Title: Uncertainty Visualization via Low-Dimensional Posterior ProjectionsSubjects: Computer Vision and Pattern Recognition (cs.CV)
In ill-posed inverse problems, it is commonly desirable to obtain insight into the full spectrum of plausible solutions, rather than extracting only a single reconstruction. Information about the plausible solutions and their likelihoods is encoded in the posterior distribution. However, for high-dimensional data, this distribution is challenging to visualize. In this work, we introduce a new approach for estimating and visualizing posteriors by employing energy-based models (EBMs) over low-dimensional subspaces. Specifically, we train a conditional EBM that receives an input measurement and a set of directions that span some low-dimensional subspace of solutions, and outputs the probability density function of the posterior within that space. We demonstrate the effectiveness of our method across a diverse range of datasets and image restoration problems, showcasing its strength in uncertainty quantification and visualization. As we show, our method outperforms a baseline that projects samples from a diffusion-based posterior sampler, while being orders of magnitude faster. Furthermore, it is more accurate than a baseline that assumes a Gaussian posterior.
- [884] arXiv:2312.10420 (replaced) [pdf, ps, html, other]
-
Title: Satisfiability modulo theories for verifying MILP certificatesComments: Added sequential transformation and parallelizationSubjects: Logic in Computer Science (cs.LO)
Correctness of results returned from mixed-integer linear programming (MILP) solvers is highly desirable, particularly in the context of applications such as hardware verification, compiler optimization, or machine-assisted theorem proving. To this end, VIPR is the first recently proposed certificate format for answers produced by MILP solvers. We design a schema to encode VIPR's inference rules as Satisfiability Modulo Theories (SMT) instances and show the equivalence of the certificates' correctness and the satisfiability of the corresponding SMT instances. In addition, we implement this schema with and without parallelization in a checker for VIPR certificates and test the viability of this approach on benchmark instances found in the literature with the cvc5 solver.
- [885] arXiv:2312.11505 (replaced) [pdf, ps, other]
-
Title: Biscari Network. Tutti gli uomini del principeComments: Uncomplete text. Draft versionSubjects: Computers and Society (cs.CY)
Thanks to its heterogeneity, the Biscari Archive, one of the most representative family's archives in Sicily, in a new digital historical study, became a valuable set of computable data that can lead historians to reconstruct the history of the city of Catania and Sicily. Ignazio Paterno' Castello and his wife Anna, princes of Biscari, were the promoters of the city's reconstruction after the 1693 earthquake, both politically and culturally. How could the digital historical methodology fulfil the traditional Historiography gap about how this noble family built its mighty? As we know, Humanities cannot easily be encapsulated in a few understandable numbers and names. However, historians, boosting Artificial Intelligence, such as Transkribus, and applying Historical Networks Analysis could help computers infer computable meaning from the digitised historical primary source. The Turing Machine became the most powerful tool to help historians understand what happened in the Past and identify the actors in cities and places' cultural and political renewal.
- [886] arXiv:2312.11753 (replaced) [pdf, ps, html, other]
-
Title: Recording and Describing Poker HandsComments: 8 pages, accepted to 2024 IEEE Conference on GamesSubjects: Artificial Intelligence (cs.AI)
This paper introduces the Poker Hand History (PHH) file format, designed to standardize the recording of poker hands across different game variants. Despite poker's widespread popularity in the mainstream culture as a mind sport and its prominence in the field of artificial intelligence (AI) research as a benchmark for imperfect information AI agents, it lacks a consistent format that humans can use to document poker hands across different variants that can also easily be parsed by machines. To address this gap in the literature, we propose the PHH format which provides a concise human-readable machine-friendly representation of hand history that comprehensively captures various details of the hand, ranging from initial game parameters and actions to contextual parameters including but not limited to the venue, players, and time control information. In the supplementary, we provide 10,088 hands covering 11 different variants in the PHH format. The full specification is available on this https URL
- [887] arXiv:2312.12186 (replaced) [pdf, ps, other]
-
Title: Social Learning in Community Structured GraphsSubjects: Social and Information Networks (cs.SI); Multiagent Systems (cs.MA); Signal Processing (eess.SP)
Traditional social learning frameworks consider environments with a homogeneous state, where each agent receives observations conditioned on that true state of nature. In this work, we relax this assumption and study the distributed hypothesis testing problem in a heterogeneous environment, where each agent can receive observations conditioned on their own personalized state of nature (or truth). We particularly focus on community structured networks, where each community admits their own true hypothesis. This scenario is common in various contexts, such as when sensors are spatially distributed, or when individuals in a social network have differing views or opinions. We show that the adaptive social learning strategy is a preferred choice for nonstationary environments, and allows each cluster to discover its own truth.
- [888] arXiv:2312.13064 (replaced) [pdf, ps, other]
-
Title: LPR: Large Language Models-Aided Program ReductionComments: Accepted by ISSTA'24. This is the preprint versionSubjects: Programming Languages (cs.PL); Software Engineering (cs.SE)
Program reduction is a prevalent technique to facilitate compilers' debugging by automatically minimizing bug-triggering programs. Existing program reduction techniques are either generic across languages (e.g., Perses and Vulcan) or specifically customized for one certain language by employing language-specific features, like C-Reduce. However, striking the balance between generality across multiple programming languages and specificity to individual languages in program reduction is yet to be explored. This paper proposes LPR, the first technique utilizing LLMs to perform language-specific program reduction for multiple languages. The core insight is to utilize both the language-generic syntax level program reduction (e.g., Perses) and the language-specific semantic level program transformations learned by LLMs. Alternately, language-generic program reducers efficiently reduce programs into 1-tree-minimality, which is small enough to be manageable for LLMs; LLMs effectively transform programs via the learned semantics to expose new reduction opportunities for the language-generic program reducers to further reduce the programs. Our extensive evaluation on 50 benchmarks across three languages (C, Rust, and JavaScript) has highlighted LPR's practicality and superiority over Vulcan, the state-of-the-art language-generic program reducer. For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript. Moreover, LPR and Vulcan have demonstrated their potential to complement each other. By using Vulcan on LPR's output for C programs, we achieve program sizes comparable to those reduced by C-Reduce. For efficiency, LPR takes 10.77%, 34.88%, 36.96% less time than Vulcan to finish all benchmarks in C, Rust and JavaScript, separately.
- [889] arXiv:2312.14920 (replaced) [pdf, ps, other]
-
Title: A Novel Sampled Clustering Algorithm for Rice Phenotypic DataComments: 31 Pages, 3 Figures, 7 TablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Phenotypic (or Physical) characteristics of plant species are commonly used to perform clustering. In one of our recent works (Shastri et al. (2021)), we used a probabilistically sampled (using pivotal sampling) and spectrally clustered algorithm to group soybean species. These techniques were used to obtain highly accurate clusterings at a reduced cost. In this work, we extend the earlier algorithm to cluster rice species. We improve the base algorithm in three ways.
First, we propose a new function to build the similarity matrix in Spectral Clustering. Commonly, a natural exponential function is used for this purpose. Based upon the spectral graph theory and the involved Cheeger's inequality, we propose the use a base "a" exponential function instead. This gives a similarity matrix spectrum favorable for clustering, which we support via an eigenvalue analysis. Also, the function used to build the similarity matrix in Spectral Clustering was earlier scaled with a fixed factor (called global scaling). Based upon the idea of Zelnik-Manor and Perona (2004), we now use a factor that varies with matrix elements (called local scaling) and works better.
Second, to compute the inclusion probability of a specie in the pivotal sampling algorithm, we had earlier used the notion of deviation that captured how far specie's characteristic values were from their respective base values (computed over all species). A maximum function was used before to find the base values. We now use a median function, which is more intuitive. We support this choice using a statistical analysis.
Third, with experiments on 1865 rice species, we demonstrate that in terms of silhouette values, our new Sampled Spectral Clustering is 61% better than Hierarchical Clustering (currently prevalent). Also, our new algorithm is significantly faster than Hierarchical Clustering due to the involved sampling. - [890] arXiv:2312.15881 (replaced) [pdf, ps, other]
-
Title: Attention-aware Social Graph Transformer Networks for Stochastic Trajectory PredictionComments: 14 pages, 9 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Trajectory prediction is fundamental to various intelligent technologies, such as autonomous driving and robotics. The motion prediction of pedestrians and vehicles helps emergency braking, reduces collisions, and improves traffic safety. Current trajectory prediction research faces problems of complex social interactions, high dynamics and multi-modality. Especially, it still has limitations in long-time prediction. We propose Attention-aware Social Graph Transformer Networks for multi-modal trajectory prediction. We combine Graph Convolutional Networks and Transformer Networks by generating stable resolution pseudo-images from Spatio-temporal graphs through a designed stacking and interception method. Furthermore, we design the attention-aware module to handle social interaction information in scenarios involving mixed pedestrian-vehicle traffic. Thus, we maintain the advantages of the Graph and Transformer, i.e., the ability to aggregate information over an arbitrary number of neighbors and the ability to perform complex time-dependent data processing. We conduct experiments on datasets involving pedestrian, vehicle, and mixed trajectories, respectively. Our results demonstrate that our model minimizes displacement errors across various metrics and significantly reduces the likelihood of collisions. It is worth noting that our model effectively reduces the final displacement error, illustrating the ability of our model to predict for a long time.
- [891] arXiv:2312.16392 (replaced) [pdf, ps, other]
-
Title: Adaptive Depth Networks with Skippable Sub-PathsComments: 15 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Predictable adaptation of network depths can be an effective way to control inference latency and meet the resource condition of various devices. However, previous adaptive depth networks do not provide general principles and a formal explanation on why and which layers can be skipped, and, hence, their approaches are hard to be generalized and require long and complex training steps. In this paper, we present a practical approach to adaptive depth networks that is applicable to various networks with minimal training effort. In our approach, every hierarchical residual stage is divided into two sub-paths, and they are trained to acquire different properties through a simple self-distillation strategy. While the first sub-path is essential for hierarchical feature learning, the second one is trained to refine the learned features and minimize performance degradation if it is skipped. Unlike prior adaptive networks, our approach does not train every target sub-network in an iterative manner. At test time, however, we can connect these sub-paths in a combinatorial manner to select sub-networks of various accuracy-efficiency trade-offs from a single network. We provide a formal rationale for why the proposed training method can reduce overall prediction errors while minimizing the impact of skipping sub-paths. We demonstrate the generality and effectiveness of our approach with convolutional neural networks and transformers.
- [892] arXiv:2312.16980 (replaced) [pdf, ps, other]
-
Title: 3DTINC: Time-Equivariant Non-Contrastive Learning for Predicting Disease Progression from Longitudinal OCTsTaha Emre, Arunava Chakravarty, Antoine Rivail, Dmitrii Lachinov, Oliver Leingang, Sophie Riedl, Julia Mai, Hendrik P.N. Scholl, Sobha Sivaprasad, Daniel Rueckert, Andrew Lotery, Ursula Schmidt-Erfurth, Hrvoje BogunovićComments: Published in IEEE TMISubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Self-supervised learning (SSL) has emerged as a powerful technique for improving the efficiency and effectiveness of deep learning models. Contrastive methods are a prominent family of SSL that extract similar representations of two augmented views of an image while pushing away others in the representation space as negatives. However, the state-of-the-art contrastive methods require large batch sizes and augmentations designed for natural images that are impractical for 3D medical images. To address these limitations, we propose a new longitudinal SSL method, 3DTINC, based on non-contrastive learning. It is designed to learn perturbation-invariant features for 3D optical coherence tomography (OCT) volumes, using augmentations specifically designed for OCT. We introduce a new non-contrastive similarity loss term that learns temporal information implicitly from intra-patient scans acquired at different times. Our experiments show that this temporal information is crucial for predicting progression of retinal diseases, such as age-related macular degeneration (AMD). After pretraining with 3DTINC, we evaluated the learned representations and the prognostic models on two large-scale longitudinal datasets of retinal OCTs where we predict the conversion to wet-AMD within a six months interval. Our results demonstrate that each component of our contributions is crucial for learning meaningful representations useful in predicting disease progression from longitudinal volumetric scans.
- [893] arXiv:2312.17364 (replaced) [pdf, ps, other]
-
Title: Randomness Requirements and Asymmetries in Nash EquilibriaSubjects: Computer Science and Game Theory (cs.GT)
In general, Nash equilibria in normal-form games may require players to play (probabilistically) mixed strategies. We define a measure of the complexity of finite probability distributions and study the complexity required to play Nash equilibria in finite two player $n\times n$ games with rational payoffs. Our central results show that there exist games in which there is an exponential vs. linear gap in the complexity of the mixed distributions that the two players play in the (unique) Nash equilibrium of these games. This gap induces asymmetries in the amounts of space required by the players to represent and sample from the corresponding distributions using known state-of-the-art sampling algorithms. We also establish exponential upper and lower bounds on the complexity of Nash equilibria in normal-form games. These results highlight (i) the nontriviality of the assumption that players can play any mixed strategy and (ii) the disparity in resources that players may require to play Nash equilibria in normal-form games.
- [894] arXiv:2401.00442 (replaced) [pdf, ps, other]
-
Title: A Comprehensive Overview of Fish-Eye Camera Distortion Correction MethodsSubjects: Computer Vision and Pattern Recognition (cs.CV)
The fisheye camera, with its unique wide field of view and other characteristics, has found extensive applications in various fields. However, the fisheye camera suffers from significant distortion compared to pinhole cameras, resulting in distorted images of captured objects. Fish-eye camera distortion is a common issue in digital image processing, requiring effective correction techniques to enhance image quality. This review provides a comprehensive overview of various methods used for fish-eye camera distortion correction. The article explores the polynomial distortion model, which utilizes polynomial functions to model and correct radial distortions. Additionally, alternative approaches such as panorama mapping, grid mapping, direct methods, and deep learning-based methods are discussed. The review highlights the advantages, limitations, and recent advancements of each method, enabling readers to make informed decisions based on their specific needs.
- [895] arXiv:2401.00651 (replaced) [pdf, ps, other]
-
Title: IRWE: Inductive Random Walk for Joint Inference of Identity and Position Network EmbeddingSubjects: Social and Information Networks (cs.SI)
Network embedding, which maps graphs to distributed representations, is a unified framework for various graph inference tasks. According to the topology properties (e.g., structural roles and community memberships of nodes) to be preserved, it can be categorized into the identity and position embedding. However, existing methods can only capture one type of property. Some approaches can support the inductive inference that generalizes the embedding model to new nodes or graphs but relies on the availability of attributes. Due to the complicated correlations between topology and attributes, it is unclear for some inductive methods which type of property they can capture. In this study, we explore a unified framework for the joint inductive inference of identity and position embeddings without attributes. An inductive random walk embedding (IRWE) method is proposed, which combines multiple attention units to handle the random walk on graph topology and simultaneously derives identity and position embeddings that are jointly optimized. In particular, we demonstrate that some random walk statistics can be informative features to characterize node identities and positions while supporting the inductive embedding inference. Experiments validate the superior performance of IRWE beyond various baselines for the transductive and inductive inference of identity and position embeddings.
- [896] arXiv:2401.01783 (replaced) [pdf, ps, other]
-
Title: Approximating Numerical Fluxes Using Fourier Neural Operators for Hyperbolic Conservation LawsComments: 39 pages, 16 figuresSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
Traditionally, classical numerical schemes have been employed to solve partial differential equations (PDEs) using computational methods. Recently, neural network-based methods have emerged. Despite these advancements, neural network-based methods, such as physics-informed neural networks (PINNs) and neural operators, exhibit deficiencies in robustness and generalization. To address these issues, numerous studies have integrated classical numerical frameworks with machine learning techniques, incorporating neural networks into parts of traditional numerical methods. In this study, we focus on hyperbolic conservation laws by replacing traditional numerical fluxes with neural operators. To this end, we developed loss functions inspired by established numerical schemes related to conservation laws and approximated numerical fluxes using Fourier neural operators (FNOs). Our experiments demonstrated that our approach combines the strengths of both traditional numerical schemes and FNOs, outperforming standard FNO methods in several respects. For instance, we demonstrate that our method is robust, has resolution invariance, and is feasible as a data-driven method. In particular, our method can make continuous predictions over time and exhibits superior generalization capabilities with out-of-distribution (OOD) samples, which are challenges that existing neural operator methods encounter.
- [897] arXiv:2401.02882 (replaced) [pdf, ps, html, other]
-
Title: SpatialVisVR: An Immersive, Multiplexed Medical Image Viewer With Contextual Similar-Patient SearchJai Prakash Veerla, Partha Sai Guttikonda, Amir Hajighasemi, Jillur Rahman Saurav, Aarti Darji, Cody T. Reynolds, Mohamed Mohamed, Mohammad S. Nasr, Helen H. Shang, Jacob M. LuberSubjects: Human-Computer Interaction (cs.HC); Tissues and Organs (q-bio.TO)
In contemporary pathology, multiplexed immunofluorescence (mIF) and multiplex immunohistochemistry (mIHC) present both significant opportunities and challenges. These methodologies shed light on intricate tumor microenvironment interactions, emphasizing the need for intuitive visualization tools to analyze vast biological datasets effectively. As electronic health records (EHR) proliferate and physicians face increasing information overload, the integration of advanced technologies becomes imperative. SpatialVisVR emerges as a versatile VR platform tailored for comparing medical images, with adaptability for data privacy on embedded hardware. Clinicians can capture pathology slides in real-time via mobile devices, leveraging SpatialVisVR's deep learning algorithm to match and display similar mIF images. This interface supports the manipulation of up to 100 multiplexed protein channels, thereby assisting in immuno-oncology decision-making. Ultimately, SpatialVisVR aims to streamline diagnostic processes, advocating for a comprehensive and efficient approach to immuno-oncology research and treatment.
- [898] arXiv:2401.03233 (replaced) [pdf, ps, html, other]
-
Title: Convergence Rate Maximization for Split Learning-based Control of EMG Prosthetic DevicesComments: Accepted to the 20th International Conference on Intelligent Environments (IE), 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Split Learning (SL) is a promising Distributed Learning approach in electromyography (EMG) based prosthetic control, due to its applicability within resource-constrained environments. Other learning approaches, such as Deep Learning and Federated Learning (FL), provide suboptimal solutions, since prosthetic devices are extremely limited in terms of processing power and battery life. The viability of implementing SL in such scenarios is caused by its inherent model partitioning, with clients executing the smaller model segment. However, selecting an inadequate cut layer hinders the training process in SL systems. This paper presents an algorithm for optimal cut layer selection in terms of maximizing the convergence rate of the model. The performance evaluation demonstrates that the proposed algorithm substantially accelerates the convergence in an EMG pattern recognition task for improving prosthetic device control.
- [899] arXiv:2401.05912 (replaced) [pdf, ps, html, other]
-
Title: Prompt-based mental health screening from social media textComments: To appear in BrasNam-2024Subjects: Computation and Language (cs.CL)
This article presents a method for prompt-based mental health screening from a large and noisy dataset of social media text. Our method uses GPT 3.5. prompting to distinguish publications that may be more relevant to the task, and then uses a straightforward bag-of-words text classifier to predict actual user labels. Results are found to be on pair with a BERT mixture of experts classifier, and incurring only a fraction of its training costs.
- [900] arXiv:2401.06291 (replaced) [pdf, ps, other]
-
Title: Frequency-Time Diffusion with Neural Cellular AutomataSubjects: Computer Vision and Pattern Recognition (cs.CV)
Despite considerable success, large Denoising Diffusion Models (DDMs) with UNet backbone pose practical challenges, particularly on limited hardware and in processing gigapixel images. To address these limitations, we introduce two Neural Cellular Automata (NCA)-based DDMs: Diff-NCA and FourierDiff-NCA. Capitalizing on the local communication capabilities of NCA, Diff-NCA significantly reduces the parameter counts of NCA-based DDMs. Integrating Fourier-based diffusion enables global communication early in the diffusion process. This feature is particularly valuable in synthesizing complex images with important global features, such as the CelebA dataset. We demonstrate that even a 331k parameter Diff-NCA can generate 512x512 pathology slices, while FourierDiff-NCA (1.1m parameters) reaches a three times lower FID score of 43.86, compared to the four times bigger UNet (3.94m parameters) with a score of 128.2. Additionally, FourierDiff-NCA can perform diverse tasks such as super-resolution, out-of-distribution image synthesis, and inpainting without explicit training.
- [901] arXiv:2401.07114 (replaced) [pdf, ps, other]
-
Title: Revisiting Sampson Approximations for Geometric Estimation ProblemsSubjects: Computer Vision and Pattern Recognition (cs.CV); Algebraic Geometry (math.AG)
Many problems in computer vision can be formulated as geometric estimation problems, i.e. given a collection of measurements (e.g. point correspondences) we wish to fit a model (e.g. an essential matrix) that agrees with our observations. This necessitates some measure of how much an observation ``agrees" with a given model. A natural choice is to consider the smallest perturbation that makes the observation exactly satisfy the constraints. However, for many problems, this metric is expensive or otherwise intractable to compute. The so-called Sampson error approximates this geometric error through a linearization scheme. For epipolar geometry, the Sampson error is a popular choice and in practice known to yield very tight approximations of the corresponding geometric residual (the reprojection error).
In this paper we revisit the Sampson approximation and provide new theoretical insights as to why and when this approximation works, as well as provide explicit bounds on the tightness under some mild assumptions. Our theoretical results are validated in several experiments on real data and in the context of different geometric estimation tasks. - [902] arXiv:2401.07301 (replaced) [pdf, ps, other]
-
Title: Small Language Model Can Self-correctSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Generative Language Models (LMs) such as ChatGPT have exhibited remarkable performance across various downstream tasks. Nevertheless, one of their most prominent drawbacks is generating inaccurate or false information with a confident tone. Previous studies have devised sophisticated pipelines and prompts to induce large LMs to exhibit the capability for self-correction. However, large LMs are explicitly prompted to verify and modify its answers separately rather than completing all steps spontaneously like humans. Moreover, these complex prompts are extremely challenging for small LMs to follow. In this paper, we introduce the \underline{I}ntrinsic \underline{S}elf-\underline{C}orrection (ISC) in generative language models, aiming to correct the initial output of LMs in a self-triggered manner, even for those small LMs with 6 billion parameters. Specifically, we devise a pipeline for constructing self-correction data and propose Partial Answer Masking (PAM), aiming to endow the model with the capability for intrinsic self-correction through fine-tuning. We conduct experiments using LMs with parameters sizes ranging from 6 billion to 13 billion in two tasks, including commonsense reasoning and factual knowledge reasoning. Our experiments demonstrate that the outputs generated using ISC outperform those generated without self-correction. We believe that the output quality of even small LMs can be further improved by empowering them with the ability to intrinsic self-correct.
- [903] arXiv:2401.10039 (replaced) [pdf, ps, other]
-
Title: GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in Zero-Shot Egocentric Action Recognition (ZS-EAR). Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We propose a refined approach for ZS-EAR using VLMs, emphasizing fine-grained concept-description alignment that capitalizes on the rich semantic and contextual details in egocentric videos. In this paper, we introduce GPT4Ego, a straightforward yet remarkably potent VLM framework for ZS-EAR, designed to enhance the fine-grained alignment of concept and description between vision and language. Extensive experiments demonstrate GPT4Ego significantly outperforms existing VLMs on three large-scale egocentric video benchmarks, i.e., EPIC-KITCHENS-100 (33.2%, +9.4%), EGTEA (39.6%, +5.5%), and CharadesEgo (31.5%, +2.6%).
- [904] arXiv:2401.10368 (replaced) [pdf, ps, other]
-
Title: HRL-TSCH: A Hierarchical Reinforcement Learning-based TSCH Scheduler for IIoTComments: 17 pages, 13 figures, 5 tables, journalSubjects: Networking and Internet Architecture (cs.NI)
The Industrial Internet of Things (IIoT) demands adaptable Networked Embedded Systems (NES) for optimal performance. Combined with recent advances in Artificial Intelligence (AI), tailored solutions can be developed to meet specific application requirements. This study introduces HRL-TSCH, an approach rooted in Hierarchical Reinforcement Learning (HRL), to devise Time Slotted Channel Hopping (TSCH) schedules provisioning IIoT demand. HRL-TSCH employs dual policies: one at a higher level for TSCH schedule link management, and another at a lower level for timeslot and channel assignments. The proposed RL agents address a multi-objective problem, optimizing throughput, power efficiency, and network delay based on predefined application requirements. Simulation experiments demonstrate HRL-TSCH superiority over existing state-of-art approaches, effectively achieving an optimal balance between throughput, power consumption, and delay, thereby enhancing IIoT network performance.
- [905] arXiv:2401.10397 (replaced) [pdf, ps, other]
-
Title: Analyzing and Mitigating Bias for Vulnerable Classes: Towards Balanced Representation in DatasetDewant Katare, David Solans Noguero, Souneil Park, Nicolas Kourtellis, Marijn Janssen, Aaron Yi DingSubjects: Computer Vision and Pattern Recognition (cs.CV)
The accuracy and fairness of perception systems in autonomous driving are essential, especially for vulnerable road users such as cyclists, pedestrians, and motorcyclists who face significant risks in urban driving environments. While mainstream research primarily enhances class performance metrics, the hidden traits of bias inheritance in the AI models, class imbalances and disparities within the datasets are often overlooked. Our research addresses these issues by investigating class imbalances among vulnerable road users, with a focus on analyzing class distribution, evaluating performance, and assessing bias impact. Utilizing popular CNN models and Vision Transformers (ViTs) with the nuScenes dataset, our performance evaluation indicates detection disparities for underrepresented classes. Compared to related work, we focus on metric-specific and Cost-Sensitive learning for model optimization and bias mitigation, which includes data augmentation and resampling. Using the proposed mitigation approaches, we see improvement in IoU(\%) and NDS(\%) metrics from 71.3 to 75.6 and 80.6 to 83.7 for the CNN model. Similarly, for ViT, we observe improvement in IoU and NDS metrics from 74.9 to 79.2 and 83.8 to 87.1. This research contributes to developing reliable models while enhancing inclusiveness for minority classes in datasets.
- [906] arXiv:2401.11404 (replaced) [pdf, ps, html, other]
-
Title: PlasmoData.jl -- A Julia Framework for Modeling and Analyzing Complex Data as GraphsComments: 62 pages, 18 figures, 8 tablesSubjects: Mathematical Software (cs.MS); Machine Learning (cs.LG)
Datasets encountered in scientific and engineering applications appear in complex formats (e.g., images, multivariate time series, molecules, video, text strings, networks). Graph theory provides a unifying framework to model such datasets and enables the use of powerful tools that can help analyze, visualize, and extract value from data. In this work, we present PlasmoData.jl, an open-source, Julia framework that uses concepts of graph theory to facilitate the modeling and analysis of complex datasets. The core of our framework is a general data modeling abstraction, which we call a DataGraph. We show how the abstraction and software implementation can be used to represent diverse data objects as graphs and to enable the use of tools from topology, graph theory, and machine learning (e.g., graph neural networks) to conduct a variety of tasks. We illustrate the versatility of the framework by using real datasets: i) an image classification problem using topological data analysis to extract features from the graph model to train machine learning models; ii) a disease outbreak problem where we model multivariate time series as graphs to detect abnormal events; and iii) a technology pathway analysis problem where we highlight how we can use graphs to navigate connectivity. Our discussion also highlights how PlasmoData.jl leverages native Julia capabilities to enable compact syntax, scalable computations, and interfaces with diverse packages.
- [907] arXiv:2401.11542 (replaced) [pdf, ps, other]
-
Title: Nigel -- Mechatronic Design and Robust Sim2Real Control of an Over-Actuated Autonomous VehicleComments: Accepted in IEEE/ASME Transactions on Mechatronics (TMECH) and additionally accepted to be presented at IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) 2024Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Simulation to reality (sim2real) transfer from a dynamics and controls perspective usually involves re-tuning or adapting the designed algorithms to suit real-world operating conditions, which often violates the performance guarantees established originally. This work presents a generalizable framework for achieving reliable sim2real transfer of autonomy-oriented control systems using multi-model multi-objective robust optimal control synthesis, which lends well to uncertainty handling and disturbance rejection with theoretical guarantees. Particularly, this work is centered around a novel actuation-redundant scaled autonomous vehicle called Nigel, with independent all-wheel drive and independent all-wheel steering architecture, whose enhanced configuration space bodes well for robust control applications. To this end, we present the mechatronic design, dynamics modeling, parameter identification, and robust stabilizing as well as tracking control of Nigel using the proposed framework, with exhaustive experimentation and benchmarking in simulation as well as real-world settings.
- [908] arXiv:2401.13100 (replaced) [pdf, ps, other]
-
Title: Bayesian sampling using interacting particlesSubjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Bayesian sampling is an important task in statistics and machine learning. Over the past decade, many ensemble-type sampling methods have been proposed. In contrast to the classical Markov chain Monte Carlo methods, these new methods deploy a large number of interactive samples, and the communication between these samples is crucial in speeding up the convergence. To justify the validity of these sampling strategies, the concept of interacting particles naturally calls for the mean-field theory. The theory establishes a correspondence between particle interactions encoded in a set of coupled ODEs/SDEs and a PDE that characterizes the evolution of the underlying distribution. This bridges numerical algorithms with the PDE theory used to show convergence in time. Many mathematical machineries are developed to provide the mean-field analysis, and we showcase two such examples: The coupling method and the compactness argument built upon the martingale strategy. The former has been deployed to show the convergence of ensemble Kalman sampler and ensemble Kalman inversion, and the latter will be shown to be immensely powerful in proving the validity of the Vlasov-Boltzmann simulator.
- [909] arXiv:2401.13248 (replaced) [pdf, ps, other]
-
Title: "Here's Your Evidence": False Consensus in Public Twitter Discussions of COVID-19 ScienceAlexandros Efstratiou, Marina Efstratiou, Satrio Yudhoatmojo, Jeremy Blackburn, Emiliano De CristofaroSubjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
The COVID-19 pandemic brought about an extraordinary rate of scientific papers on the topic that were discussed among the general public, although often in biased or misinformed ways. In this paper, we present a mixed-methods analysis aimed at examining whether public discussions were commensurate with the scientific consensus on several COVID-19 issues. We estimate scientific consensus based on samples of abstracts from preprint servers and compare against the volume of public discussions on Twitter mentioning these papers. We find that anti-consensus posts and users, though overall less numerous than pro-consensus ones, are vastly over-represented on Twitter, thus producing a false consensus effect. This transpires with favorable papers being disproportionately amplified, along with an influx of new anti-consensus user sign-ups. Finally, our content analysis highlights that anti-consensus users misrepresent scientific findings or question scientists' integrity in their efforts to substantiate their claims.
- [910] arXiv:2401.13920 (replaced) [pdf, ps, html, other]
-
Title: LocMoE: A Low-Overhead MoE for Large Language Model TrainingComments: 1. Update the font size of all figures 2. Update the format of formulas 3. Modify the format of the title caseSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-to-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGu-Sigma model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12.68% to 22.24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.
- [911] arXiv:2401.14109 (replaced) [pdf, ps, other]
-
Title: CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor NetworksAndrei Tomut, Saeed S. Jahromi, Abhijoy Sarkar, Uygar Kurt, Sukhbinder Singh, Faysal Ishtiaq, Cesar Muñoz, Prabdeep Singh Bajaj, Ali Elborady, Gianni del Bimbo, Mehrazin Alizadeh, David Montero, Pablo Martin-Ramiro, Muhammad Ibrahim, Oussama Tahiri Alaoui, John Malcolm, Samuel Mugel, Roman OrusComments: 5 pages, 4 figures, 2 tables, and supplementary information of 2 pages and 1 figure. Revised version with new benchmarks for LlaMA2-7BSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantum Physics (quant-ph)
Large Language Models (LLMs) such as ChatGPT and LlaMA are advancing rapidly in generative Artificial Intelligence (AI), but their immense size poses significant challenges, such as huge training and inference costs, substantial energy demands, and limitations for on-site deployment. Traditional compression methods such as pruning, distillation, and low-rank approximation focus on reducing the effective number of neurons in the network, while quantization focuses on reducing the numerical precision of individual weights to reduce the model size while keeping the number of neurons fixed. While these compression methods have been relatively successful in practice, there is no compelling reason to believe that truncating the number of neurons is an optimal strategy. In this context, this paper introduces CompactifAI, an innovative LLM compression approach using quantum-inspired Tensor Networks that focuses on the model's correlation space instead, allowing for a more controlled, refined and interpretable model compression. Our method is versatile and can be implemented with - or on top of - other compression techniques. As a benchmark, we demonstrate that a combination of CompactifAI with quantization allows to reduce a 93% the memory size of LlaMA 7B, reducing also 70% the number of parameters, accelerating 50% the training and 25% the inference times of the model, and just with a small accuracy drop of 2% - 3%, going much beyond of what is achievable today by other compression techniques. Our methods also allow to perform a refined layer sensitivity profiling, showing that deeper layers tend to be more suitable for tensor network compression, which is compatible with recent observations on the ineffectiveness of those layers for LLM performance. Our results imply that standard LLMs are, in fact, heavily overparametrized, and do not need to be large at all.
- [912] arXiv:2401.14241 (replaced) [pdf, ps, other]
-
Title: New Algorithms for Computing Sibson Capacity and Arimoto CapacitySubjects: Information Theory (cs.IT)
The Sibson and Arimoto capacity, which are based on the Sibson and Arimoto mutual information (MI) of order {\alpha}, respectively, are well-known generalizations of the channel capacity C. In this study, we derive novel alternating optimization algorithms for computing these capacities by providing new variational characterizations of the Sibson and Arimoto MI. Moreover, we prove that all iterative algorithms for computing these capacities are equivalent under appropriate conditions imposed on their initial distributions.
- [913] arXiv:2401.14446 (replaced) [pdf, ps, html, other]
-
Title: Black-Box Access is Insufficient for Rigorous AI AuditsStephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-MenellComments: FAccT, 2024Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.
- [914] arXiv:2401.14589 (replaced) [pdf, ps, other]
-
Title: Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive BiasYu He Ke, Rui Yang, Sui An Lie, Taylor Xin Yi Lim, Hairil Rizal Abdullah, Daniel Shu Wei Ting, Nan LiuComments: 21 pages, 3 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.
Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy.
Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses.
Results: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80).
Conclusions: The framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. The LLM-driven multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios. - [915] arXiv:2401.14652 (replaced) [pdf, ps, other]
-
Title: LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint OptimizationSubjects: Neural and Evolutionary Computing (cs.NE)
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient, making them well-suited for low-power edge devices. However, the pursuit of accuracy in current studies leads to large, long-timestep SNNs, conflicting with the resource constraints of these devices. In order to design lightweight and efficient SNNs, we propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process. Spatially, we present a novel Compressive Convolution block (CompConv) to expand the search space to support pruning and mixed-precision quantization. Temporally, we are the first to propose a compressive timestep search to identify the optimal number of timesteps under specific computation cost constraints. Finally, we formulate a joint optimization to simultaneously learn the architecture parameters and spatial-temporal compression strategies to achieve high performance while minimizing memory and computation costs. Experimental results on CIFAR-10, CIFAR-100, and Google Speech Command datasets demonstrate our proposed LitE-SNNs can achieve competitive or even higher accuracy with remarkably smaller model sizes and fewer computation costs.
- [916] arXiv:2401.15356 (replaced) [pdf, ps, other]
-
Title: A Decision Theoretic Framework for Measuring AI RelianceSubjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Humans frequently make decisions with the aid of artificially intelligent (AI) systems. A common pattern is for the AI to recommend an action to the human who retains control over the final decision. Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance. We argue that the current definition of appropriate reliance used in such research lacks formal statistical grounding and can lead to contradictions. We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's recommendation from challenges a human may face in differentiating the signals and forming accurate beliefs about the situation. Our definition gives rise to a framework that can be used to guide the design and interpretation of studies on human-AI complementarity and reliance. Using recent AI-advised decision making studies from literature, we demonstrate how our framework can be used to separate the loss due to mis-reliance from the loss due to not accurately differentiating the signals. We evaluate these losses by comparing to a baseline and a benchmark for complementary performance defined by the expected payoff achieved by a rational decision-maker facing the same decision task as the behavioral decision-makers.
- [917] arXiv:2401.15912 (replaced) [pdf, ps, other]
-
Title: An Efficient, High-Rate Scheme for Private Information Retrieval over the Gaussian MACSubjects: Information Theory (cs.IT)
This paper revisited the problem of Private Information Retrieval (PIR), where there are $N$ replicated non-communicating databases containing the same $M$ messages and a user who wishes to retrieve one of the messages without revealing the wanted message's index to the databases. However, we assume a block-fading additive white Gaussian noise multiple access channel (AWGN MAC) linking the user and the databases. Previous work \cite{shmuel2021private} presented a joint channel-PIR scheme, utilizing the Compute and Forward protocol, showing the potential of a joint channel-PIR scheme over a separated one. This paper proposes an improved joint scheme tailored for the PIR problem with $N$ databases over a block-fading AWGN. Unlike the C\&F protocol, our scheme offers reduced computational complexity while improving the scaling laws governing the achievable rate. Specifically, the achievable rate scales with the number of databases $N$ and the power $P$ similarly to the channel capacity without the privacy constraint and outperforms the C\&F-based approach. Furthermore, the analysis demonstrates that the improved rate exhibits only a finite gap from the unconstrained channel capacity -- one bit per second per Hz as $N$ increases.
- [918] arXiv:2401.16342 (replaced) [pdf, ps, other]
-
Title: On Achievable Rates for the Shotgun Sequencing Channel with ErasuresComments: Accepted for presentation at ISIT 2024Subjects: Information Theory (cs.IT)
In shotgun sequencing, the input string (typically, a long DNA sequence composed of nucleotide bases) is sequenced as multiple overlapping fragments of much shorter lengths (called \textit{reads}). Modelling the shotgun sequencing pipeline as a communication channel for DNA data storage, the capacity of this channel was identified in a recent work, assuming that the reads themselves are noiseless substrings of the original sequence. Modern shotgun sequencers however also output quality scores for each base read, indicating the confidence in its identification. Bases with low quality scores can be considered to be erased. Motivated by this, we consider the \textit{shotgun sequencing channel with erasures}, where each symbol in any read can be independently erased with some probability $\delta$. We identify achievable rates for this channel, using a random code construction and a decoder that uses typicality-like arguments to merge the reads.
- [919] arXiv:2401.16417 (replaced) [pdf, ps, other]
-
Title: Channel Coding with Mean and Variance Cost ConstraintsSubjects: Information Theory (cs.IT)
We consider channel coding for discrete memoryless channels (DMCs) with a novel cost constraint that constrains both the mean and the variance of the cost of the codewords. We show that the maximum (asymptotically) achievable rate under the new cost formulation is equal to the capacity-cost function; in particular, the strong converse holds. We further characterize the optimal second-order coding rate of these cost-constrained codes; in particular, the optimal second-order coding rate is finite. We then show that the second-order coding performance is strictly improved with feedback using a new variation of timid/bold coding, significantly broadening the applicability of timid/bold coding schemes from unconstrained compound-dispersion channels to all cost-constrained channels. Equivalent results on the minimum average probability of error are also given.
- [920] arXiv:2401.16445 (replaced) [pdf, ps, other]
-
Title: OMPGPT: A Generative Pre-trained Transformer Model for OpenMPSubjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Large language models (LLMs)such as ChatGPT have significantly advanced the field of Natural Language Processing (NLP). This trend led to the development of code-based large language models such as StarCoder, WizardCoder, and CodeLlama, which are trained extensively on vast repositories of code and programming languages. While the generic abilities of these code LLMs are useful for many programmers in tasks like code generation, the area of high-performance computing (HPC) has a narrower set of requirements that make a smaller and more domain-specific model a smarter choice. This paper presents OMPGPT, a novel domain-specific model meticulously designed to harness the inherent strengths of language models for OpenMP pragma generation. Furthermore, we leverage prompt engineering techniques from the NLP domain to create Chain-of-OMP, an innovative strategy designed to enhance OMPGPT's effectiveness. Our extensive evaluations demonstrate that OMPGPT outperforms existing large language models specialized in OpenMP tasks and maintains a notably smaller size, aligning it more closely with the typical hardware constraints of HPC environments. We consider our contribution as a pivotal bridge, connecting the advantage of language models with the specific demands of HPC tasks.
- [921] arXiv:2401.17083 (replaced) [pdf, ps, other]
-
Title: Online Robot Navigation and Manipulation with Distilled Vision-Language ModelsComments: International Conference on Robotics and Automation (ICRA)Subjects: Robotics (cs.RO)
Autonomous robot navigation within the dynamic unknown environment is of crucial significance for mobile robotic applications including robot navigation in last-mile delivery and robot-enabled automated supplies in industrial and hospital delivery applications. Current solutions still suffer from limitations, such as the robot cannot recognize unknown objects in real-time and cannot navigate freely in a dynamic, narrow, and complex environment. We propose a complete software framework for autonomous robot perception and navigation within very dense obstacles and dense human crowds. First, we propose a framework that accurately detects and segments open-world object categories in a zero-shot manner, which overcomes the over-segmentation limitation of the current SAM model. Second, we proposed the distillation strategy to distill the knowledge to segment the free space of the walkway for robot navigation without the label. In the meantime, we design the trimming strategy that works collaboratively with distillation to enable lightweight inference to deploy the neural network on edge devices such as NVIDIA-TX2 or Xavier NX during autonomous navigation. Integrated into the robot navigation system, extensive experiments demonstrate that our proposed framework has achieved superior performance in terms of both accuracy and efficiency in robot scene perception and autonomous robot navigation.
- [922] arXiv:2401.17173 (replaced) [pdf, ps, other]
-
Title: Zero-Shot Reinforcement Learning via Function EncodersSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation learning algorithm which represents a function as a weighted combination of learned, non-linear basis functions. By using a function encoder to represent the reward function or the transition function, the agent has information on how the current task relates to previously seen tasks via a coherent vector representation. Thus, the agent is able to achieve transfer between related tasks at run time with no additional training. We demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability in three RL fields by augmenting basic RL algorithms with a function encoder task representation.
- [923] arXiv:2401.17235 (replaced) [pdf, ps, html, other]
-
Title: Explicit Good Codes Approaching Distance 1 in Ulam MetricSubjects: Information Theory (cs.IT); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
The Ulam distance of two permutations on $[n]$ is $n$ minus the length of their longest common subsequence. In this paper, we show that for every $\varepsilon>0$, there exists some $\alpha>0$, and an infinite set $\Gamma\subseteq \mathbb{N}$, such that for all $n\in\Gamma$, there is an explicit set $C_n$ of $(n!)^{\alpha}$ many permutations on $[n]$, such that every pair of permutations in $C_n$ has pairwise Ulam distance at least $(1-\varepsilon)\cdot n$. Moreover, we can compute the $i^{\text{th}}$ permutation in $C_n$ in poly$(n)$ time and can also decode in poly$(n)$ time, a permutation $\pi$ on $[n]$ to its closest permutation $\pi^*$ in $C_n$, if the Ulam distance of $\pi$ and $\pi^*$ is less than $ \frac{(1-\varepsilon)\cdot n}{4} $.
Previously, it was implicitly known by combining works of Goldreich and Wigderson [Israel Journal of Mathematics'23] and Farnoud, Skachek, and Milenkovic [IEEE Transactions on Information Theory'13] in a black-box manner, that it is possible to explicitly construct $(n!)^{\Omega(1)}$ many permutations on $[n]$, such that every pair of them have pairwise Ulam distance at least $\frac{n}{6}\cdot (1-\varepsilon)$, for any $\varepsilon>0$, and the bound on the distance can be improved to $\frac{n}{4}\cdot (1-\varepsilon)$ if the construction of Goldreich and Wigderson is directly analyzed in the Ulam metric. - [924] arXiv:2402.00152 (replaced) [pdf, ps, other]
-
Title: Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev LossSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.
- [925] arXiv:2402.00195 (replaced) [pdf, ps, html, other]
-
Title: Dataset Condensation Driven Machine UnlearningSubjects: Machine Learning (cs.LG)
The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples. The corresponding code is available at \href{this https URL}{URL}.
- [926] arXiv:2402.00281 (replaced) [pdf, ps, other]
-
Title: Guided Interpretable Facial Expression Recognition via Spatial Action Unit CuesComments: 15 pages, 11 figures, 3 tables, International Conference on Automatic Face and Gesture Recognition (FG 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Although state-of-the-art classifiers for facial expression recognition (FER) can achieve a high level of accuracy, they lack interpretability, an important feature for end-users. Experts typically associate spatial action units (\aus) from a codebook to facial regions for the visual interpretation of expressions. In this paper, the same expert steps are followed. A new learning strategy is proposed to explicitly incorporate \au cues into classifier training, allowing to train deep interpretable models. During training, this \au codebook is used, along with the input image expression label, and facial landmarks, to construct a \au heatmap that indicates the most discriminative image regions of interest w.r.t the facial expression. This valuable spatial cue is leveraged to train a deep interpretable classifier for FER. This is achieved by constraining the spatial layer features of a classifier to be correlated with \au heatmaps. Using a composite loss, the classifier is trained to correctly classify an image while yielding interpretable visual layer-wise attention correlated with \au maps, simulating the expert decision process. Our strategy only relies on image class expression for supervision, without additional manual annotations. Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time. Our extensive evaluation on two public benchmarks \rafdb, and \affectnet datasets shows that our proposed strategy can improve layer-wise interpretability without degrading classification performance. In addition, we explore a common type of interpretable classifiers that rely on class activation mapping (CAM) methods, and show that our approach can also improve CAM interpretability.
- [927] arXiv:2402.00447 (replaced) [pdf, ps, html, other]
-
Title: A Survey of Data-Efficient Graph LearningComments: Accepted by Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Graph-structured data, prevalent in domains ranging from social networks to biochemical analysis, serve as the foundation for diverse real-world systems. While graph neural networks demonstrate proficiency in modeling this type of data, their success is often reliant on significant amounts of labeled data, posing a challenge in practical scenarios with limited annotation resources. To tackle this problem, tremendous efforts have been devoted to enhancing graph machine learning performance under low-resource settings by exploring various approaches to minimal supervision. In this paper, we introduce a novel concept of Data-Efficient Graph Learning (DEGL) as a research frontier, and present the first survey that summarizes the current progress of DEGL. We initiate by highlighting the challenges inherent in training models with large labeled data, paving the way for our exploration into DEGL. Next, we systematically review recent advances on this topic from several key aspects, including self-supervised graph learning, semi-supervised graph learning, and few-shot graph learning. Also, we state promising directions for future research, contributing to the evolution of graph machine learning.
- [928] arXiv:2402.01536 (replaced) [pdf, ps, html, other]
-
Title: Homogenization Effects of Large Language Models on Human Creative IdeationComments: Accepted to C&C 2024Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Large language models (LLMs) are now being used in a wide variety of contexts, including as creativity support tools (CSTs) intended to help their users come up with new ideas. But do LLMs actually support user creativity? We hypothesized that the use of an LLM as a CST might make the LLM's users feel more creative, and even broaden the range of ideas suggested by each individual user, but also homogenize the ideas suggested by different users. We conducted a 36-participant comparative user study and found, in accordance with the homogenization hypothesis, that different users tended to produce less semantically distinct ideas with ChatGPT than with an alternative CST. Additionally, ChatGPT users generated a greater number of more detailed ideas, but felt less responsible for the ideas they generated. We discuss potential implications of these findings for users, designers, and developers of LLM-based CSTs.
- [929] arXiv:2402.01590 (replaced) [pdf, ps, other]
-
Title: NeuroCine: Decoding Vivid Video Sequences from Human Brain ActivtiesComments: under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
In the pursuit to understand the intricacies of human brain's visual processing, reconstructing dynamic visual experiences from brain activities emerges as a challenging yet fascinating endeavor. While recent advancements have achieved success in reconstructing static images from non-invasive brain recordings, the domain of translating continuous brain activities into video format remains underexplored. In this work, we introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data, such as noises, spatial redundancy and temporal lags. This framework proposes spatial masking and temporal interpolation-based augmentation for contrastive learning fMRI representations and a diffusion model enhanced by dependent prior noise for video generation. Tested on a publicly available fMRI dataset, our method shows promising results, outperforming the previous state-of-the-art models by a notable margin of ${20.97\%}$, ${31.00\%}$ and ${12.30\%}$ respectively on decoding the brain activities of three subjects in the fMRI dataset, as measured by SSIM. Additionally, our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.
- [930] arXiv:2402.02382 (replaced) [pdf, ps, other]
-
Title: Revisiting the Power of Prompt for Visual TuningComments: Accepted by ICML2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance in fine-tuning. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no increase in computational expenses compared to VPT. Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. For instance, it surpasses full fine-tuning in 19 out of 24 tasks, using less than 0.4% of learnable parameters on the FGVC and VTAB-1K benchmarks. Notably, our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%. Besides, the experimental results demonstrate the proposed SPT is robust to prompt lengths and scales well with model capacity and training data size. We finally provide an insightful exploration into the amount of target data facilitating the adaptation of pre-trained models to downstream tasks. The code is available at this https URL.
- [931] arXiv:2402.02807 (replaced) [pdf, ps, other]
-
Title: Are Sounds Sound for Phylogenetic Reconstruction?Comments: Paper accepted for SIGTYP (2024): Häuser, Luise; Jäger, Gerhard; List, Johann-Mattis; Rama, Taraka; and Stamatakis, Alexandros (2024): Are sounds sound for phylogenetic reconstruction? In: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (SIGTYP 2024)Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.
- [932] arXiv:2402.02868 (replaced) [pdf, ps, other]
-
Title: Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation ProblemMaciej Wołczyk, Bartłomiej Cupiał, Mateusz Ostaszewski, Michał Bortkiewicz, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr MiłośComments: ICML 2024Subjects: Machine Learning (cs.LG)
Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: forgetting of pre-trained capabilities. Namely, a model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning, on which the model behaved well due to pre-training. This way, we lose the anticipated transfer benefits. We identify conditions when this problem occurs, showing that it is common and, in many cases, catastrophic. Through a detailed empirical analysis of the challenging NetHack and Montezuma's Revenge environments, we show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities. In particular, in NetHack, we achieve a new state-of-the-art for neural models, improving the previous best score from $5$K to over $10$K points in the Human Monk scenario.
- [933] arXiv:2402.03837 (replaced) [pdf, ps, other]
-
Title: Expressivity of Geometric Inhomogeneous Random Graphs -- Metric and Non-MetricSubjects: Social and Information Networks (cs.SI); Discrete Mathematics (cs.DM)
Recently there has been increased interest in fitting generative graph models to real-world networks. In particular, Bläsius et al. have proposed a framework for systematic evaluation of the expressivity of random graph models. We extend this framework to Geometric Inhomogeneous Random Graphs (GIRGs). This includes a family of graphs induced by non-metric distance functions which allow capturing more complex models of partial similarity between nodes as a basis of connection - as well as homogeneous and non-homogeneous feature spaces. As part of the extension, we develop schemes for estimating the multiplicative constant and the long-range parameter in the connection probability. Moreover, we devise an algorithm for sampling Minimum-Component-Distance GIRGs whose runtime is linear both in the number of vertices and in the dimension of the underlying geometric space. Our results provide evidence that GIRGs are more realistic candidates with respect to various graph features such as closeness centrality, betweenness centrality, local clustering coefficient, and graph effective diameter, while they face difficulties to replicate higher variance and more extreme values of graph statistics observed in real-world networks.
- [934] arXiv:2402.04005 (replaced) [pdf, ps, other]
-
Title: Bayesian Uncertainty for Gradient Aggregation in Multi-Task LearningSubjects: Machine Learning (cs.LG)
As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Running a dedicated model for each task is computationally expensive and therefore there is a great interest in multi-task learning (MTL). MTL aims at learning a single model that solves several tasks efficiently. Optimizing MTL models is often achieved by computing a single gradient per task and aggregating them for obtaining a combined update direction. However, these approaches do not consider an important aspect, the sensitivity in the gradient dimensions. Here, we introduce a novel gradient aggregation approach using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a distribution over the gradients of the tasks. This additional valuable information allows us to quantify the uncertainty in each of the gradients dimensions, which can then be factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.
- [935] arXiv:2402.04825 (replaced) [pdf, ps, other]
-
Title: Fast Timing-Conditioned Latent Audio DiffusionComments: Accepted to ICML 2024. Code: this https URL. Metrics: this https URL. Demo: this https URLSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Generating long-form 44.1kHz stereo audio from text prompts can be computationally demanding. Further, most previous works do not tackle that music and sound effects naturally vary in their duration. Our research focuses on the efficient generation of long-form, variable-length stereo music and sounds at 44.1kHz using text prompts with a generative model. Stable Audio is based on latent diffusion, with its latent defined by a fully-convolutional variational autoencoder. It is conditioned on text prompts as well as timing embeddings, allowing for fine control over both the content and length of the generated music and sounds. Stable Audio is capable of rendering stereo signals of up to 95 sec at 44.1kHz in 8 sec on an A100 GPU. Despite its compute efficiency and fast inference, it is one of the best in two public text-to-music and -audio benchmarks and, differently from state-of-the-art models, can generate music with structure and stereo sounds.
- [936] arXiv:2402.05099 (replaced) [pdf, ps, other]
-
Title: Hydragen: High-Throughput LLM Inference with Shared PrefixesSubjects: Machine Learning (cs.LG)
Transformer-based large language models (LLMs) are now deployed to hundreds of millions of users. LLM inference is commonly performed on batches of sequences that share a prefix, such as few-shot examples or a chatbot system prompt. Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matrix-vector products for every sequence in the batch. In this work, we introduce Hydragen, a hardware-aware exact implementation of attention with shared prefixes. Hydragen computes attention over the shared prefix and unique suffixes separately. This decomposition enables efficient prefix attention by batching queries together across sequences, reducing redundant memory reads and enabling the use of hardware-friendly matrix multiplications. Our method can improve end-to-end CodeLlama-13b throughput by up to 32x against competitive baselines, with speedup growing with the batch size and shared prefix length. Hydragen also enables the use of very long shared contexts: with a large batch size, increasing the prefix length from 1K to 16K tokens decreases Hydragen throughput by less than 15%, while the throughput of baselines drops by over 90%. Hydragen generalizes beyond simple prefix-suffix decomposition and can be applied to tree-based prompt sharing patterns, allowing us to further reduce inference time on competitive programming problems by 55%.
- [937] arXiv:2402.05132 (replaced) [pdf, ps, other]
-
Title: TexShape: Information Theoretic Sentence Embedding for Language ModelsComments: Submitted to the 2024 IEEE International Symposium on Information TheorySubjects: Computation and Language (cs.CL); Information Theory (cs.IT)
With the exponential growth in data volume and the emergence of data-intensive applications, particularly in the field of machine learning, concerns related to resource utilization, privacy, and fairness have become paramount. This paper focuses on the textual domain of data and addresses challenges regarding encoding sentences to their optimized representations through the lens of information-theory. In particular, we use empirical estimates of mutual information, using the Donsker-Varadhan definition of Kullback-Leibler divergence. Our approach leverages this estimation to train an information-theoretic sentence embedding, called TexShape, for (task-based) data compression or for filtering out sensitive information, enhancing privacy and fairness. In this study, we employ a benchmark language model for initial text representation, complemented by neural networks for information-theoretic compression and mutual information estimations. Our experiments demonstrate significant advancements in preserving maximal targeted information and minimal sensitive information over adverse compression ratios, in terms of predictive accuracy of downstream models that are trained using the compressed data.
- [938] arXiv:2402.07844 (replaced) [pdf, ps, html, other]
-
Title: Mercury: An Efficiency Benchmark for LLM Code SynthesisSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
Amidst the recent strides in evaluating Large Language Models for Code (Code-LLMs), existing benchmarks have mainly focused on functional correctness, overlooking the importance of computational efficiency. To fill the gap, we present Mercury, the first computational efficiency benchmark for Code-LLMs. It comprises 1,889 Python tasks, each with adequate solutions to support a runtime distribution. Based on the distribution, we introduce a new metric Beyond, which computes a runtime-percentile-weighted Pass score to reflect functional correctness and computational efficiency simultaneously. On Mercury, leading Code-LLMs can achieve 67% on Pass, while less than 50% on Beyond. Given that an ideal Beyond score would be aligned with the Pass score, it indicates that while Code-LLMs exhibit impressive capabilities in generating functionally correct code, there remains a notable gap in their efficiency. Finally, our empirical experiments reveal that Direct Preference Optimization (DPO) serves as a robust baseline for enhancing computational efficiency compared with Supervised Fine Tuning (SFT), which paves a promising avenue for future exploration of efficient code generation.
- [939] arXiv:2402.07899 (replaced) [pdf, ps, html, other]
-
Title: A systematic investigation of learnability from single child linguistic inputComments: Please cite as Qin, Y., Wang, W., and Lake, B. M. (2024). A systematic investigation of learnability from single child linguistic input. In Proceedings of the 46th Annual Conference of the Cognitive Science SocietySubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Language models (LMs) have demonstrated remarkable proficiency in generating linguistically coherent text, sparking discussions about their relevance to understanding human language learnability. However, a significant gap exists between the training data for these models and the linguistic input a child receives. LMs are typically trained on data that is orders of magnitude larger and fundamentally different from child-directed speech (Warstadt and Bowman, 2022; Warstadt et al., 2023; Frank, 2023a). Addressing this discrepancy, our research focuses on training LMs on subsets of a single child's linguistic input. Previously, Wang, Vong, Kim, and Lake (2023) found that LMs trained in this setting can form syntactic and semantic word clusters and develop sensitivity to certain linguistic phenomena, but they only considered LSTMs and simpler neural networks trained from just one single-child dataset. Here, to examine the robustness of learnability from single-child input, we systematically train six different model architectures on five datasets (3 single-child and 2 baselines). We find that the models trained on single-child datasets showed consistent results that matched with previous work, underscoring the robustness of forming meaningful syntactic and semantic representations from a subset of a child's linguistic input.
- [940] arXiv:2402.08264 (replaced) [pdf, ps, other]
-
Title: On Iiro Honkala's contributions to identifying codesSubjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
A set $C$ of vertices in a graph $G=(V,E)$ is an identifying code if it is dominating and any two vertices of $V$ are dominated by distinct sets of codewords. This paper presents a survey of Iiro Honkala's contributions to the study of identifying codes with respect to several aspects: complexity of computing an identifying code, combinatorics in binary Hamming spaces, infinite grids, relationships between identifying codes and usual parameters in graphs, structural properties of graphs admitting identifying codes, and number of optimal identifying codes.
- [941] arXiv:2402.08265 (replaced) [pdf, ps, other]
-
Title: A Dense Reward View on Aligning Text-to-Image Diffusion with PreferenceComments: 41st International Conference on Machine Learning (ICML 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a latent reward on the entire diffusion reverse chain, while ignoring the sequential nature of the generation process. This may harm the efficacy and efficiency of preference alignment. In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. In particular, we introduce temporal discounting into DPO-style explicit-reward-free objectives, to break the temporal symmetry therein and suit the T2I generation hierarchy. In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines, both quantitatively and qualitatively. Further investigations are conducted to illustrate the insight of our approach.
- [942] arXiv:2402.08674 (replaced) [pdf, ps, other]
-
Title: Human Curriculum Effects Emerge with In-Context Learning in Neural NetworksComments: 7 pages, 4 figures, accepted as a talk + full paper at CogSci 2024Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tradeoff spontaneously emerges with ``in-context learning'' (ICL) both in neural networks trained with metalearning and in large language models (LLMs). ICL is the ability to learn new tasks ``in context'' -- without weight changes -- via an inner-loop algorithm implemented in activation dynamics. Experiments with pretrained LLMs and metalearning transformers show that ICL exhibits the blocking advantage demonstrated in humans on a task involving rule-like structure, and conversely, that concurrent in-weight learning reproduces the interleaving advantage observed in humans on tasks lacking such structure.
- [943] arXiv:2402.09833 (replaced) [pdf, ps, other]
-
Title: R4: rapid reproducible robotics research open hardware control systemSubjects: Robotics (cs.RO)
A key component of any robot is the interface between ROS2 software and physical motors. New robots often use arbitrary, messy mixtures of closed and open motor drivers and error-prone physical mountings, wiring, and connectors to interface them. There is a need for a standardizing OSH component to abstract this complexity, as Arduino did for interfacing to smaller components. We present a OSH printed circuit board to solve this problem once and for all. On the high-level side, it interfaces to Arduino Giga -- acting as an unusually large and robust shield -- and thus to existing open source ROS software stacks. On the lower-level side, it interfaces to existing emerging standard open hardware including OSH motor drivers and relays, which can already be used to drive fully open hardware wheeled and arm robots. This enables the creation of a family of standardized, fully open hardware, fully reproducible, research platforms.
- [944] arXiv:2402.10228 (replaced) [pdf, ps, other]
-
Title: HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex EnvironmentsComments: Proceedings of the $\mathit{41}^{st}$ International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). Invited talk in Informs Optimization Conference 2024 and International Symposium on Mathematical Programming 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.
- [945] arXiv:2402.11295 (replaced) [pdf, ps, html, other]
-
Title: OneBit: Towards Extremely Low-bit Large Language ModelsComments: 15 pages, 6 figures, 5 tablesSubjects: Computation and Language (cs.CL)
Model quantification uses low bit-width values to represent the weight matrices of models, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs. However, existing quantization methods suffer severe performance degradation when the bit-width is extremely reduced, and thus focus on utilizing 4-bit or 8-bit values to quantize models. This paper boldly quantizes the weight matrices of LLMs to 1-bit, paving the way for the extremely low bit-width deployment of LLMs. For this target, we introduce a 1-bit quantization-aware training (QAT) framework named OneBit, including a novel 1-bit parameter representation method to better quantize LLMs as well as an effective parameter initialization method based on matrix decomposition to improve the convergence speed of the QAT framework. Sufficient experimental results indicate that OneBit achieves good performance (at least 83% of the non-quantized performance) with robust training processes when only using 1-bit weight matrices.
- [946] arXiv:2402.12563 (replaced) [pdf, ps, other]
-
Title: Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language ModelsComments: 12 figures, 9 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The recent success of Large Language Models (LLMs) has catalyzed an increasing interest in their self-correction capabilities. This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs, attempting to address the ongoing debate about its feasibility. Our research has identified an important latent factor - the "confidence" of LLMs - during the self-correction process. Overlooking this factor may cause the models to over-criticize themselves, resulting in unreliable conclusions regarding the efficacy of self-correction. We have experimentally observed that LLMs possess the capability to understand the "confidence" in their own responses. It motivates us to develop an "If-or-Else" (IoE) prompting framework, designed to guide LLMs in assessing their own "confidence", facilitating intrinsic self-corrections. We conduct extensive experiments and demonstrate that our IoE-based Prompt can achieve a consistent improvement regarding the accuracy of self-corrected responses over the initial answers. Our study not only sheds light on the underlying factors affecting self-correction in LLMs, but also introduces a practical framework that utilizes the IoE prompting principle to efficiently improve self-correction capabilities with "confidence". The code is available at this https URL.
- [947] arXiv:2402.12602 (replaced) [pdf, ps, other]
-
Title: Physically Consistent Modeling of Stacked Intelligent Metasurfaces Implemented with Beyond Diagonal RISComments: Accepted by IEEE for publicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Stacked intelligent metasurface (SIM) has emerged as a technology enabling wave domain beamforming through multiple stacked reconfigurable intelligent surfaces (RISs). SIM has been implemented so far with diagonal RIS (D-RIS), while SIM implemented with beyond diagonal RIS (BD-RIS) remains unexplored. Furthermore, a model of SIM accounting for mutual coupling is not yet available. To fill these gaps, we derive a physically consistent channel model for SIM-aided systems and clarify the assumptions needed to obtain the simplified model used in related works. Using this model, we show that 1-layer SIM implemented with BD-RIS achieves the performance upper bound with limited complexity.
- [948] arXiv:2402.12797 (replaced) [pdf, ps, other]
-
Title: A Geometric Algorithm for Tubular Shape Reconstruction from Skeletal RepresentationComments: 9 pages (without reference), 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)
We introduce a novel approach for the reconstruction of tubular shapes from skeletal representations. Our method processes all skeletal points as a whole, eliminating the need for splitting input structure into multiple segments. We represent the tubular shape as a truncated signed distance function (TSDF) in a voxel hashing manner, in which the signed distance between a voxel center and the object is computed through a simple geometric algorithm. Our method does not involve any surface sampling scheme or solving large matrix equations, and therefore is a faster and more elegant solution for tubular shape reconstruction compared to other approaches. Experiments demonstrate the efficiency and effectiveness of the proposed method. Code is avaliable at this https URL.
- [949] arXiv:2402.14528 (replaced) [pdf, ps, other]
-
Title: ACE : Off-Policy Actor-Critic with Causality-Aware Entropy RegularizationTianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, Huazhe XuComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The varying significance of distinct primitive behaviors during the policy learning process has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore the causal relationship between different action dimensions and rewards to evaluate the significance of various primitive behaviors during training. We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Furthermore, to prevent excessive focus on specific primitive behaviors, we analyze the gradient dormancy phenomenon and introduce a dormancy-guided reset mechanism to further enhance the efficacy of our method. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at this https URL.
- [950] arXiv:2402.14693 (replaced) [pdf, ps, other]
-
Title: Joint AP-UE Association and Power Factor Optimization for Distributed Massive MIMOSubjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
The uplink sum-throughput of distributed massive multiple-input-multiple-output (mMIMO) networks depends majorly on Access point (AP)-User Equipment (UE) association and power control. The AP-UE association and power control both are important problems in their own right in distributed mMIMO networks to improve scalability and reduce front-haul load of the network, and to enhance the system performance by mitigating the interference and boosting the desired signals, respectively. Unlike previous studies, which focused primarily on addressing these two problems separately, this work addresses the uplink sum-throughput maximization problem in distributed mMIMO networks by solving the joint AP-UE association and power control problem, while maintaining Quality-of-Service (QoS) requirements for each UE. To improve scalability, we present an l1-penalty function that delicately balances the trade-off between spectral efficiency (SE) and front-haul signaling load. Our proposed methodology leverages fractional programming, Lagrangian dual formation, and penalty functions to provide an elegant and effective iterative solution with guaranteed convergence. Extensive numerical simulations validate the efficacy of the proposed technique for maximizing sum-throughput while considering the joint AP-UE association and power control problem, demonstrating its superiority over approaches that address these problems individually. Furthermore, the results show that the introduced penalty function can help us effectively control the maximum front-haul load.
- [951] arXiv:2402.15119 (replaced) [pdf, ps, other]
-
Title: A multidisciplinary framework for deconstructing bots' pluripotency in dualistic antagonismSubjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Anthropomorphic social bots are engineered to emulate human verbal communication and generate toxic or inflammatory content across social networking services (SNSs). Bot-disseminated misinformation could subtly yet profoundly reshape societal processes by complexly interweaving factors like repeated disinformation exposure, amplified political polarization, compromised indicators of democratic health, shifted perceptions of national identity, propagation of false social norms, and manipulation of collective memory over time. However, extrapolating bots' pluripotency across hybridized, multilingual, and heterogeneous media ecologies from isolated SNS analyses remains largely unknown, underscoring the need for a comprehensive framework to characterise bots' emergent risks to civic discourse. Here we propose an interdisciplinary framework to characterise bots' pluripotency, incorporating quantification of influence, network dynamics monitoring, and interlingual feature analysis. When applied to the geopolitical discourse around the Russo-Ukrainian conflict, results from interlanguage toxicity profiling and network analysis elucidated spatiotemporal trajectories of pro-Russian and pro-Ukrainian human and bots across hybrid SNSs. Weaponized bots predominantly inhabited X, while human primarily populated Reddit in the social media warfare. This rigorous framework promises to elucidate interlingual homogeneity and heterogeneity in bots' pluripotent behaviours, revealing synergistic human-bot mechanisms underlying regimes of information manipulation, echo chamber formation, and collective memory manifestation in algorithmically structured societies.
- [952] arXiv:2402.19223 (replaced) [pdf, ps, other]
-
Title: Edit and Alphabet-Ordering Sensitivity of Lex-parseSubjects: Data Structures and Algorithms (cs.DS)
We investigate the compression sensitivity [Akagi et al., 2023] of lex-parse [Navarro et al., 2021] for two operations: (1) single character edit and (2) modification of the alphabet ordering, and give tight upper and lower bounds for both operations. For both lower bounds, we use the family of Fibonacci words. For the bounds on edit operations, our analysis makes heavy use of properties of the Lyndon factorization of Fibonacci words to characterize the structure of lex-parse.
- [953] arXiv:2403.02683 (replaced) [pdf, ps, html, other]
-
Title: Learning to Defer to a Population: A Meta-Learning ApproachComments: Accepted at the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The learning to defer (L2D) framework allows autonomous systems to be safe and robust by allocating difficult decisions to a human expert. All existing work on L2D assumes that each expert is well-identified, and if any expert were to change, the system should be re-trained. In this work, we alleviate this constraint, formulating an L2D system that can cope with never-before-seen experts at test-time. We accomplish this by using meta-learning, considering both optimization- and model-based variants. Given a small context set to characterize the currently available expert, our framework can quickly adapt its deferral policy. For the model-based approach, we employ an attention mechanism that is able to look for points in the context set that are similar to a given test point, leading to an even more precise assessment of the expert's abilities. In the experiments, we validate our methods on image recognition, traffic sign detection, and skin lesion diagnosis benchmarks.
- [954] arXiv:2403.03663 (replaced) [pdf, ps, html, other]
-
Title: Robust Safety-Critical Control for Systems with Sporadic Measurements and Dwell Time ConstraintsComments: Extended version contains additional commentary and the proofs to all lemmas and theoremsSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper presents extensions of control barrier function (CBF) theory to systems with disturbances wherein a controller only receives measurements infrequently and operates open-loop between measurements, while still satisfying state constraints. The paper considers both impulsive and continuous actuators, and models the actuators, measurements, disturbances, and timing constraints as a hybrid dynamical system. We then design an open-loop observer that bounds the worst-case uncertainty between measurements. We develop definitions of CBFs for both actuation cases, and corresponding conditions on the control input to guarantee satisfaction of the state constraints. We apply these conditions to simulations of a satellite rendezvous in an elliptical orbit and autonomous orbit stationkeeping.
- [955] arXiv:2403.03913 (replaced) [pdf, ps, html, other]
-
Title: Multipolar opinion evolution in biased networksSubjects: Systems and Control (eess.SY)
Motivated by empirical research on bias and opinion formation, we formulate a multidimensional nonlinear opinion-dynamical model where agents have individual biases, which are fixed, as well as opinions, which evolve. The dimensions represent competing options, of which each agent has a relative opinion, and are coupled through normalization of the opinion vector. This can capture, for example, an individual's relative trust in different media. In special cases including where biases are uniform across agents our model achieves consensus, but in general, behaviors are richer and capture multipolar opinion distributions. We examine general fixed points of the system, as well as special cases such as zero biases toward certain options or partitioned decision sets. Lastly, we demonstrate that our model exhibits polarization when biases are spatially correlated across the network, while, as empirical research suggests, a mixed community can mediate biases.
- [956] arXiv:2403.04112 (replaced) [pdf, ps, other]
-
Title: Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous DrivingComments: Published at IEEE European Control Conference 2024Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel multi-modal Multi-Object Tracking (MOT) algorithm for self-driving cars that combines camera and LiDAR data. Camera frames are processed with a state-of-the-art 3D object detector, whereas classical clustering techniques are used to process LiDAR observations. The proposed MOT algorithm comprises a three-step association process, an Extended Kalman filter for estimating the motion of each detected dynamic obstacle, and a track management phase. The EKF motion model requires the current measured relative position and orientation of the observed object and the longitudinal and angular velocities of the ego vehicle as inputs. Unlike most state-of-the-art multi-modal MOT approaches, the proposed algorithm does not rely on maps or knowledge of the ego global pose. Moreover, it uses a 3D detector exclusively for cameras and is agnostic to the type of LiDAR sensor used. The algorithm is validated both in simulation and with real-world data, with satisfactory results.
- [957] arXiv:2403.08002 (replaced) [pdf, ps, html, other]
-
Title: Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluationJuan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Serena Yeung-Levy, Curtis P. Langlotz, Sheng Wang, Hoifung PoonSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating state-of-the-art pre-trained models for image and text modalities, and focusing on training a lightweight adapter to ground each modality to the text embedding space, as exemplified by LLaVA-Med. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. For best practice, we conduct a systematic ablation study on various choices in data engineering and multimodal training. The resulting LlaVA-Rad (7B) model attains state-of-the-art results on standard radiology tasks such as report generation and cross-modal retrieval, even outperforming much larger models such as GPT-4V and Med-PaLM M (84B). The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
- [958] arXiv:2403.08901 (replaced) [pdf, ps, other]
-
Title: A Framework for Strategic Discovery of Credible Neural Network Surrogate Models under UncertaintySubjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
The widespread integration of deep neural networks in developing data-driven surrogate models for high-fidelity simulations of complex physical systems highlights the critical necessity for robust uncertainty quantification techniques and credibility assessment methodologies, ensuring the reliable deployment of surrogate models in consequential decision-making. This study presents the Occam Plausibility Algorithm for surrogate models (OPAL-surrogate), providing a systematic framework to uncover predictive neural network-based surrogate models within the large space of potential models, including various neural network classes and choices of architecture and hyperparameters. The framework is grounded in hierarchical Bayesian inferences and employs model validation tests to evaluate the credibility and prediction reliability of the surrogate models under uncertainty. Leveraging these principles, OPAL-surrogate introduces a systematic and efficient strategy for balancing the trade-off between model complexity, accuracy, and prediction uncertainty. The effectiveness of OPAL-surrogate is demonstrated through two modeling problems, including the deformation of porous materials for building insulation and turbulent combustion flow for the ablation of solid fuels within hybrid rocket motors.
- [959] arXiv:2403.09125 (replaced) [pdf, ps, html, other]
-
Title: Exploring the Capabilities and Limitations of Large Language Models in the Electric Energy SectorSubir Majumder, Lin Dong, Fatemeh Doudi, Yuting Cai, Chao Tian, Dileep Kalathi, Kevin Ding, Anupam A. Thatte, Na Li, Le XieSubjects: Systems and Control (eess.SY)
Large Language Models (LLMs) as chatbots have drawn remarkable attention thanks to their versatile capability in natural language processing as well as in a wide range of tasks. While there has been great enthusiasm towards adopting such foundational model-based artificial intelligence tools in all sectors possible, the capabilities and limitations of such LLMs in improving the operation of the electric energy sector need to be explored, and this article identifies fruitful directions in this regard. Key future research directions include data collection systems for fine-tuning LLMs, embedding power system-specific tools in the LLMs, and retrieval augmented generation (RAG)-based knowledge pool to improve the quality of LLM responses and LLMs in safety-critical use cases.
- [960] arXiv:2403.09734 (replaced) [pdf, ps, other]
-
Title: Do Large Language Models Solve ARC Visual Analogies Like People Do?Comments: Changes (based on CogSci 2024 reviewers): - Shortened Intro - Added a table summarizing children performance across age - Added Theoretical discussion in the Discussion section - Corrected the naming of plots - Small clarifications in the Methods sectionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The Abstraction Reasoning Corpus (ARC) is a visual analogical reasoning test designed for humans and machines (Chollet, 2019). We compared human and large language model (LLM) performance on a new child-friendly set of ARC items. Results show that both children and adults outperform most LLMs on these tasks. Error analysis revealed a similar "fallback" solution strategy in LLMs and young children, where part of the analogy is simply copied. In addition, we found two other error types, one based on seemingly grasping key concepts (e.g., Inside-Outside) and the other based on simple combinations of analogy input matrices. On the whole, "concept" errors were more common in humans, and "matrix" errors were more common in LLMs. This study sheds new light on LLM reasoning ability and the extent to which we can use error analyses and comparisons with human development to understand how LLMs solve visual analogies.
- [961] arXiv:2403.10903 (replaced) [pdf, ps, other]
-
Title: DTOR: Decision Tree Outlier Regressor to explain anomaliesRiccardo Crupi, Daniele Regoli, Alessandro Damiano Sabatino, Immacolata Marano, Massimiliano Brinis, Luca Albertazzi, Andrea Cirillo, Andrea Claudio CosentiniSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Explaining outliers occurrence and mechanism of their occurrence can be extremely important in a variety of domains. Malfunctions, frauds, threats, in addition to being correctly identified, oftentimes need a valid explanation in order to effectively perform actionable counteracts. The ever more widespread use of sophisticated Machine Learning approach to identify anomalies make such explanations more challenging. We present the Decision Tree Outlier Regressor (DTOR), a technique for producing rule-based explanations for individual data points by estimating anomaly scores generated by an anomaly detection model. This is accomplished by first applying a Decision Tree Regressor, which computes the estimation score, and then extracting the relative path associated with the data point score. Our results demonstrate the robustness of DTOR even in datasets with a large number of features. Additionally, in contrast to other rule-based approaches, the generated rules are consistently satisfied by the points to be explained. Furthermore, our evaluation metrics indicate comparable performance to Anchors in outlier explanation tasks, with reduced execution time.
- [962] arXiv:2403.11300 (replaced) [pdf, ps, html, other]
-
Title: Non-Primary Channel Access in IEEE 802.11 UHR: Comprehensive Analysis and EvaluationComments: This work has been submitted to the IEEE for possible publication. 6 pages, 7 figuresSubjects: Networking and Internet Architecture (cs.NI)
The evolution of the IEEE 802.11 standards marks a significant throughput advancement in wireless access technologies, progressively increasing bandwidth capacities from 20 MHz in the IEEE 802.11a to up to 320 MHz in the latest IEEE 802.11be (Wi-Fi 7). However, the increased bandwidth capacities may not be well exploited due to inefficient bandwidth utilization on multiple channels. This issue typically occurs when the primary channel is busy, secondary channels (also known as non-primary channels) are prevented from being utilized even if they are idle, thereby wasting the available bandwidth. This paper investigates the fundamentals of the Non-Primary Channel Access (NPCA) protocol that was defined in IEEE 802.11 Ultra-High Reliability (UHR) group to cope with the above issue. We develop a novel analytical model to assess NPCA protocol performance in terms of the average throughput and delay. Via simulation, we verify that the NPCA network outperforms the legacy network by increasing at least 50% average throughput while reducing at least 40% average delay.
- [963] arXiv:2403.12418 (replaced) [pdf, ps, other]
-
Title: STG-Mamba: Spatial-Temporal Graph Learning via Selective State Space ModelSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Spatial-Temporal Graph (STG) data is characterized as dynamic, heterogenous, and non-stationary, leading to the continuous challenge of spatial-temporal graph learning. In the past few years, various GNN-based methods have been proposed to solely focus on mimicking the relationships among node individuals of the STG network, ignoring the significance of modeling the intrinsic features that exist in STG system over time. In contrast, modern Selective State Space Models (SSSMs) present a new approach which treat STG Network as a system, and meticulously explore the STG system's dynamic state evolution across temporal dimension. In this work, we introduce Spatial-Temporal Graph Mamba (STG-Mamba) as the first exploration of leveraging the powerful selective state space models for STG learning by treating STG Network as a system, and employing the Spatial-Temporal Selective State Space Module (ST-S3M) to precisely focus on the selected STG latent features. Furthermore, to strengthen GNN's ability of modeling STG data under the setting of selective state space models, we propose Kalman Filtering Graph Neural Networks (KFGN) for dynamically integrate and upgrade the STG embeddings from different temporal granularities through a learnable Kalman Filtering statistical theory-based approach. Extensive empirical studies are conducted on three benchmark STG forecasting datasets, demonstrating the performance superiority and computational efficiency of STG-Mamba. It not only surpasses existing state-of-the-art methods in terms of STG forecasting performance, but also effectively alleviate the computational bottleneck of large-scale graph networks in reducing the computational cost of FLOPs and test inference time. The implementation code is available at: \url{this https URL}.
- [964] arXiv:2403.13039 (replaced) [pdf, ps, other]
-
Title: Emotic Masked Autoencoder with Attention Fusion for Facial Expression RecognitionComments: 6 pages; added references for section 1; corrected typo for email authorSubjects: Computer Vision and Pattern Recognition (cs.CV)
Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification, particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. By utilizing low-level feature information from the ipsilateral view (auxiliary view) before learning the high-level feature that emphasizes the shift in the human facial expression, our work seeks to provide a straightforward yet innovative way to improve the examined view (main view). We also suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model, focusing on pivotal local elements. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset, as observed in both training and validation contexts.
- [965] arXiv:2403.14280 (replaced) [pdf, ps, other]
-
Title: Large Language Models for Blockchain Security: A Systematic Literature ReviewSubjects: Cryptography and Security (cs.CR)
Large Language Models (LLMs) have emerged as powerful tools across various domains within cyber security. Notably, recent studies are increasingly exploring LLMs applied to the context of blockchain security (BS). However, there remains a gap in a comprehensive understanding regarding the full scope of applications, impacts, and potential constraints of LLMs on blockchain security. To fill this gap, we undertake a literature review focusing on the studies that apply LLMs in blockchain security (LLM4BS).
Our study aims to comprehensively analyze and understand existing research, and elucidate how LLMs contribute to enhancing the security of blockchain systems. Through a thorough examination of existing literature, we delve into the integration of LLMs into various aspects of blockchain security. We explore the mechanisms through which LLMs can bolster blockchain security, including their applications in smart contract auditing, transaction anomaly detection, vulnerability repair, program analysis of smart contracts, and serving as participants in the cryptocurrency community. Furthermore, we assess the challenges and limitations associated with leveraging LLMs for enhancing blockchain security, considering factors such as scalability, privacy concerns, and ethical concerns. Our thorough review sheds light on the opportunities and potential risks of tasks on LLM4BS, providing valuable insights for researchers, practitioners, and policymakers alike. - [966] arXiv:2403.14297 (replaced) [pdf, ps, other]
-
Title: Impact Assessment of Missing Data in Model Predictions for Earth Observation ApplicationsComments: Accepted at IEEE International Geoscience and Remote Sensing Symposium 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Earth observation (EO) applications involving complex and heterogeneous data sources are commonly approached with machine learning models. However, there is a common assumption that data sources will be persistently available. Different situations could affect the availability of EO sources, like noise, clouds, or satellite mission failures. In this work, we assess the impact of missing temporal and static EO sources in trained models across four datasets with classification and regression tasks. We compare the predictive quality of different methods and find that some are naturally more robust to missing data. The Ensemble strategy, in particular, achieves a prediction robustness up to 100%. We evidence that missing scenarios are significantly more challenging in regression than classification tasks. Finally, we find that the optical view is the most critical view when it is missing individually.
- [967] arXiv:2403.14352 (replaced) [pdf, ps, other]
-
Title: Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute NodesSubjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Recent advancements in detector technology have significantly increased the size and complexity of experimental data, and high-performance computing (HPC) provides a path towards more efficient and timely data processing. However, movement of large data sets from acquisition systems to HPC centers introduces bottlenecks owing to storage I/O at both ends. This manuscript introduces a streaming workflow designed for an high data rate electron detector that streams data directly to compute node memory at the National Energy Research Scientific Computing Center (NERSC), thereby avoiding storage I/O. The new workflow deploys ZeroMQ-based services for data production, aggregation, and distribution for on-the-fly processing, all coordinated through a distributed key-value store. The system is integrated with the detector's science gateway and utilizes the NERSC Superfacility API to initiate streaming jobs through a web-based frontend. Our approach achieves up to a 14-fold increase in data throughput and enhances predictability and reliability compared to a I/O-heavy file-based transfer workflow. Our work highlights the transformative potential of streaming workflows to expedite data analysis for time-sensitive experiments.
- [968] arXiv:2403.14381 (replaced) [pdf, ps, html, other]
-
Title: Editing Knowledge Representation of Language Model via Rephrased Prefix PromptsComments: 19pages,3figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Neural language models (LMs) have been extensively trained on vast corpora to store factual knowledge about various aspects of the world described in texts. Current technologies typically employ knowledge editing methods or specific prompts to modify LM outputs. However, existing knowledge editing methods are costly and inefficient, struggling to produce appropriate text. Additionally, prompt engineering is opaque and requires significant effort to find suitable prompts. To address these issues, we introduce a new method called PSPEM (Prefix Soft Prompt Editing Method), that can be used for a lifetime with just one training. It resolves the inefficiencies and generalizability issues in knowledge editing methods and overcomes the opacity of prompt engineering by automatically seeking optimal soft prompts. Specifically, PSPEM utilizes a prompt encoder and an encoding converter to refine key information in prompts and uses prompt alignment techniques to guide model generation, ensuring text consistency and adherence to the intended structure and content, thereby maintaining an optimal balance between efficiency and accuracy. We have validated the effectiveness of PSPEM through knowledge editing and attribute inserting. On the COUNTERFACT dataset, PSPEM achieved nearly 100\% editing accuracy and demonstrated the highest level of fluency. We further analyzed the similarities between PSPEM and original prompts and their impact on the model's internals. The results indicate that PSPEM can serve as an alternative to original prompts, supporting the model in effective editing.
- [969] arXiv:2403.14421 (replaced) [pdf, ps, other]
-
Title: DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-TuningJonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan GuoSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.
- [970] arXiv:2403.14667 (replaced) [pdf, ps, other]
-
Title: Weaponization of Conscience in Cybercrime and Online Fraud: A Novel Systems TheoryComments: Updated to include more recent literature, added note that diagrams are author's own work, added two additional diagrams illustrating the examples, expanded the explanation of the concept and its applicability for practitionersSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
This article introduces the concept of weaponization of conscience as a complex system and tactic employed by fraudsters to camouflage their activity, coerce others, or to deceive their victims. This study adopts a conceptual approach, drawing from the theoretical underpinnings of military propaganda and psychological operations doctrines and adapting them to serve as a lens through which to understand and defend against weaponization of conscience.
- [971] arXiv:2403.14903 (replaced) [pdf, ps, other]
-
Title: Modeling Distributed Computing Infrastructures for HEP ApplicationsMaximilian Horzela, Henri Casanova, Manuel Giffels, Artur Gottmann, Robin Hofsaess, Günter Quast, Simone Rossi Tisbeni, Achim Streit, Frédéric SuterComments: 26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023)Journal-ref: EPJ Web of Conf. 295 (2024) 04032Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); High Energy Physics - Experiment (hep-ex)
Predicting the performance of various infrastructure design options in complex federated infrastructures with computing sites distributed over a wide area network that support a plethora of users and workflows, such as the Worldwide LHC Computing Grid (WLCG), is not trivial. Due to the complexity and size of these infrastructures, it is not feasible to deploy experimental test-beds at large scales merely for the purpose of comparing and evaluating alternate designs. An alternative is to study the behaviours of these systems using simulation. This approach has been used successfully in the past to identify efficient and practical infrastructure designs for High Energy Physics (HEP). A prominent example is the Monarc simulation framework, which was used to study the initial structure of the WLCG. New simulation capabilities are needed to simulate large-scale heterogeneous computing systems with complex networks, data access and caching patterns. A modern tool to simulate HEP workloads that execute on distributed computing infrastructures based on the SimGrid and WRENCH simulation frameworks is outlined. Studies of its accuracy and scalability are presented using HEP as a case-study. Hypothetical adjustments to prevailing computing architectures in HEP are studied providing insights into the dynamics of a part of the WLCG and candidates for improvements.
- [972] arXiv:2403.16833 (replaced) [pdf, ps, other]
-
Title: Double skew cyclic codes over $\mathbb{F}_q+v\mathbb{F}_q$Comments: 20Subjects: Information Theory (cs.IT)
In this study, in order to get better codes, we focus on double skew cyclic codes over the ring $\mathrm{R}= \mathbb{F}_q+v\mathbb{F}_q, ~v^2=v$ where $q$ is a prime power. We investigate the generator polynomials, minimal spanning sets, generator matrices, and the dual codes over the ring $\mathrm{R}$. As an implementation, the obtained results are illustrated with some good examples. Moreover, we introduce a construction for new generator matrices and thus achieve codes with improved parameters compared to those found in existing literature. Finally, we tabulate our obtained block codes over the ring $\mathrm{R}$.
- [973] arXiv:2403.17367 (replaced) [pdf, ps, other]
-
Title: RoboDuet: A Framework Affording Mobile-Manipulation and Cross-EmbodimentGuoping Pan, Qingwei Ben, Zhecheng Yuan, Guangqi Jiang, Yandong Ji, Jiangmiao Pang, Houde Liu, Huazhe XuSubjects: Robotics (cs.RO)
Combining the mobility of legged robots with the manipulation skills of arms has the potential to significantly expand the operational range and enhance the capabilities of robotic systems in performing various mobile manipulation tasks. Existing approaches are confined to imprecise six degrees of freedom (DoF) manipulation and possess a limited arm workspace. In this paper, we propose a novel framework, RoboDuet, which employs two collaborative policies to realize locomotion and manipulation simultaneously, achieving whole-body control through interactions between each other. Surprisingly, going beyond the large-range pose tracking, we find that the two-policy framework may enable cross-embodiment deployment such as using different quadrupedal robots or other arms. Our experiments demonstrate that the policies trained through RoboDuet can accomplish stable gaits, agile 6D end-effector pose tracking, and zero-shot exchange of legged robots, and can be deployed in the real world to perform various mobile manipulation tasks. Our project page with demo videos is at this https URL .
- [974] arXiv:2403.18643 (replaced) [pdf, ps, other]
-
Title: Sampling-Based Motion Planning with Online Racing Line Generation for Autonomous Driving on Three-Dimensional Race TracksComments: 8 pages, accepted to be published at the 35th IEEE Intelligent Vehicles Symposium, June 2 - 5, 2024, Jeju Shinhwa World, Jeju Island, KoreaSubjects: Robotics (cs.RO)
Existing approaches to trajectory planning for autonomous racing employ sampling-based methods, generating numerous jerk-optimal trajectories and selecting the most favorable feasible trajectory based on a cost function penalizing deviations from an offline-calculated racing line. While successful on oval tracks, these methods face limitations on complex circuits due to the simplistic geometry of jerk-optimal edges failing to capture the complexity of the racing line. Additionally, they only consider two-dimensional tracks, potentially neglecting or surpassing the actual dynamic potential. In this paper, we present a sampling-based local trajectory planning approach for autonomous racing that can maintain the lap time of the racing line even on complex race tracks and consider the race track's three-dimensional effects. In simulative experiments, we demonstrate that our approach achieves lower lap times and improved utilization of dynamic limits compared to existing approaches. We also investigate the impact of online racing line generation, in which the time-optimal solution is planned from the current vehicle state for a limited spatial horizon, in contrast to a closed racing line calculated offline. We show that combining the sampling-based planner with the online racing line generation can significantly reduce lap times in multi-vehicle scenarios.
- [975] arXiv:2403.19234 (replaced) [pdf, ps, other]
-
Title: Regularized dynamical parametric approximationSubjects: Numerical Analysis (math.NA)
This paper studies the numerical approximation of evolution equations by nonlinear parametrizations $u(t)=\Phi(q(t))$ with time-dependent parameters $q(t)$, which are to be determined in the computation. The motivation comes from approximations in quantum dynamics by multiple Gaussians and approximations of various dynamical problems by tensor networks and neural networks. In all these cases, the parametrization is typically irregular: the derivative $\Phi'(q)$ can have arbitrarily small singular values and may have varying rank. We derive approximation results for a regularized approach in the time-continuous case as well as in time-discretized cases. With a suitable choice of the regularization parameter and the time stepsize, the approach can be successfully applied in irregular situations, even though it runs counter to the basic principle in numerical analysis to avoid solving ill-posed subproblems when aiming for a stable algorithm. Numerical experiments with sums of Gaussians for approximating quantum dynamics and with neural networks for approximating the flow map of a system of ordinary differential equations illustrate and complement the theoretical results.
- [976] arXiv:2403.19303 (replaced) [pdf, ps, other]
-
Title: Developing generative AI chatbots conceptual framework for higher educationComments: 28 pagesSubjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR)
This research explores the quickly changing field of generative artificial intelligence (GAI) chatbots in higher education, an industry that is undergoing major technological changes. AI chatbots, such as ChatGPT, HuggingChat, and Google Bard, are becoming more and more common in a variety of sectors, including education. Their acceptance is still in its early phases, with a variety of prospects and obstacles. However, their potential in higher education is particularly noteworthy, providing lecturers and students with affordable, individualized support. Creating a comprehensive framework to aid the usage of generative AI chatbots in higher education institutions (HEIs) is the aim of this project. The Generative AI Chatbots Acceptance Model (GAICAM) is the result of this study's synthesis of elements from well-known frameworks, including the TAM, UTAUT2, TPB, and others along with variables like optimism, innovativeness, discomfort, insecurity, and others. Using a research method that encompasses a comprehensive analysis of extant literature from databases such as IEEE, ACM, ScienceDirect, and Google Scholar, the study aims to comprehend the implications of AI Chatbots on higher education and pinpoint critical elements for their efficacious implementation. Peer-reviewed English-language publications published between 2020 and 2023 with a focus on the use of AI chatbots in higher education were the main focus of the search criteria. The results demonstrate how much AI chatbots can do to improve student engagement, streamline the educational process, and support administrative and research duties. But there are also clear difficulties, such as unfavorable student sentiments, doubts about the veracity of material produced by AI, and unease and nervousness with new technologies.
- [977] arXiv:2404.00971 (replaced) [pdf, ps, html, other]
-
Title: Exploring and Evaluating Hallucinations in LLM-Powered Code GenerationSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
The rise of Large Language Models (LLMs) has significantly advanced many applications on software engineering tasks, particularly in code generation. Despite the promising performance, LLMs are prone to generate hallucinations, which means LLMs might produce outputs that deviate from users' intent, exhibit internal inconsistencies, or misalign with the factual knowledge, making the deployment of LLMs potentially risky in a wide range of applications. Existing work mainly focuses on investing the hallucination in the domain of natural language generation (NLG), leaving a gap in understanding the types and extent of hallucinations in the context of code generation. To bridge the gap, we conducted a thematic analysis of the LLM-generated code to summarize and categorize the hallucinations present in it. Our study established a comprehensive taxonomy of hallucinations in LLM-generated code, encompassing 5 primary categories of hallucinations depending on the conflicting objectives and varying degrees of deviation observed in code generation. Furthermore, we systematically analyzed the distribution of hallucinations, exploring variations among different LLMs and their correlation with code correctness. Based on the results, we proposed HalluCode, a benchmark for evaluating the performance of code LLMs in recognizing hallucinations. Hallucination recognition and mitigation experiments with HalluCode and HumanEval show existing LLMs face great challenges in recognizing hallucinations, particularly in identifying their types, and are hardly able to mitigate hallucinations. We believe our findings will shed light on future research about hallucination evaluation, detection, and mitigation, ultimately paving the way for building more effective and reliable code LLMs in the future.
- [978] arXiv:2404.01461 (replaced) [pdf, ps, other]
-
Title: Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMsComments: work in progressSubjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Although large language models (LLMs) have demonstrated remarkable proficiency in "understanding" text and generating human-like text, they may exhibit biases acquired from training data in doing so. Specifically, LLMs may be susceptible to a common cognitive trap in human decision-making called the representativeness heuristic. This is a concept in psychology that refers to judging the likelihood of an event based on how closely it resembles a well-known prototype or typical example versus considering broader facts or statistical evidence. This work investigates the impact of the representativeness heuristic on LLM reasoning. We created ReHeAT (Representativeness Heuristic AI Testing), a dataset containing a series of problems spanning six common types of representativeness heuristics. Experiments reveal that four LLMs applied to REHEAT all exhibited representativeness heuristic biases. We further identify that the model's reasoning steps are often incorrectly based on a stereotype rather than the problem's description. Interestingly, the performance improves when adding a hint in the prompt to remind the model of using its knowledge. This suggests the uniqueness of the representativeness heuristic compared to traditional biases. It can occur even when LLMs possess the correct knowledge while failing in a cognitive trap. This highlights the importance of future research focusing on the representativeness heuristic in model reasoning and decision-making and on developing solutions to address it.
- [979] arXiv:2404.01703 (replaced) [pdf, ps, other]
-
Title: Boosting Visual Recognition in Real-world Degradations via Unsupervised Feature Enhancement Module with Deep Channel PriorComments: 14 pages, 14 figures, publised to TIV2024Journal-ref: IEEE Transactions on Intelligent Vehicles, April 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
The environmental perception of autonomous vehicles in normal conditions have achieved considerable success in the past decade. However, various unfavourable conditions such as fog, low-light, and motion blur will degrade image quality and pose tremendous threats to the safety of autonomous driving. That is, when applied to degraded images, state-of-the-art visual models often suffer performance decline due to the feature content loss and artifact interference caused by statistical and structural properties disruption of captured images. To address this problem, this work proposes a novel Deep Channel Prior (DCP) for degraded visual recognition. Specifically, we observe that, in the deep representation space of pre-trained models, the channel correlations of degraded features with the same degradation type have uniform distribution even if they have different content and semantics, which can facilitate the mapping relationship learning between degraded and clear representations in high-sparsity feature space. Based on this, a novel plug-and-play Unsupervised Feature Enhancement Module (UFEM) is proposed to achieve unsupervised feature correction, where the multi-adversarial mechanism is introduced in the first stage of UFEM to achieve the latent content restoration and artifact removal in high-sparsity feature space. Then, the generated features are transferred to the second stage for global correlation modulation under the guidance of DCP to obtain high-quality and recognition-friendly features. Evaluations of three tasks and eight benchmark datasets demonstrate that our proposed method can comprehensively improve the performance of pre-trained models in real degradation conditions. The source code is available at this https URL
- [980] arXiv:2404.01714 (replaced) [pdf, ps, html, other]
-
Title: Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep LearningComments: 32 pages, 13 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.
- [981] arXiv:2404.01741 (replaced) [pdf, ps, other]
-
Title: Intrusion Tolerance for Networked Systems through Two-Level Feedback ControlComments: Preprint to appear in the conference proceedings of the 54th IEEE/IFIP Dependable Systems and Networks Conference (DSN'24)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)
We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.
- [982] arXiv:2404.01758 (replaced) [pdf, ps, other]
-
Title: GEARS: Local Geometry-aware Hand-object Interaction SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV)
Generating realistic hand motion sequences in interaction with objects has gained increasing attention with the growing interest in digital humans. Prior work has illustrated the effectiveness of employing occupancy-based or distance-based virtual sensors to extract hand-object interaction features. Nonetheless, these methods show limited generalizability across object categories, shapes and sizes. We hypothesize that this is due to two reasons: 1) the limited expressiveness of employed virtual sensors, and 2) scarcity of available training data. To tackle this challenge, we introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions. The sensor queries for object surface points in the neighbourhood of each hand joint. As an important step towards mitigating the learning complexity, we transform the points from global frame to hand template frame and use a shared module to process sensor features of each individual joint. This is followed by a spatio-temporal transformer network aimed at capturing correlation among the joints in different dimensions. Moreover, we devise simple heuristic rules to augment the limited training sequences with vast static hand grasping samples. This leads to a broader spectrum of grasping types observed during training, in turn enhancing our model's generalization capability. We evaluate on two public datasets, GRAB and InterCap, where our method shows superiority over baselines both quantitatively and perceptually.
- [983] arXiv:2404.02090 (replaced) [pdf, ps, other]
-
Title: Already Moderate Population Sizes Provably Yield Strong Robustness to NoiseComments: Full version of the same-titled paper accepted at GECCO 2024Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Experience shows that typical evolutionary algorithms can cope well with stochastic disturbances such as noisy function evaluations.
In this first mathematical runtime analysis of the $(1+\lambda)$ and $(1,\lambda)$ evolutionary algorithms in the presence of prior bit-wise noise, we show that both algorithms can tolerate constant noise probabilities without increasing the asymptotic runtime on the OneMax benchmark. For this, a population size $\lambda$ suffices that is at least logarithmic in the problem size $n$. The only previous result in this direction regarded the less realistic one-bit noise model, required a population size super-linear in the problem size, and proved a runtime guarantee roughly cubic in the noiseless runtime for the OneMax benchmark. Our significantly stronger results are based on the novel proof argument that the noiseless offspring can be seen as a biased uniform crossover between the parent and the noisy offspring. We are optimistic that the technical lemmas resulting from this insight will find applications also in future mathematical runtime analyses of evolutionary algorithms. - [984] arXiv:2404.02499 (replaced) [pdf, ps, other]
-
Title: Learning Generalized Policies for Fully Observable Non-Deterministic Planning DomainsComments: presented at IJCAI'24Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative FOND planning method that searches for solutions, not in the given state space but in an abstract space defined by features that must be learned as well.
- [985] arXiv:2404.03381 (replaced) [pdf, ps, other]
-
Title: Learning to Plan and Generate Text with CitationsConstanza Fierro, Reinald Kim Amplayo, Fantine Huot, Nicola De Cao, Joshua Maynez, Shashi Narayan, Mirella LapataSubjects: Computation and Language (cs.CL)
The increasing demand for the deployment of LLMs in information-seeking scenarios has spurred efforts in creating verifiable systems, which generate responses to queries along with supporting evidence. In this paper, we explore the attribution capabilities of plan-based models which have been recently shown to improve the faithfulness, grounding, and controllability of generated text. We conceptualize plans as a sequence of questions which serve as blueprints of the generated content and its organization. We propose two attribution models that utilize different variants of blueprints, an abstractive model where questions are generated from scratch, and an extractive model where questions are copied from the input. Experiments on long-form question-answering show that planning consistently improves attribution quality. Moreover, the citations generated by blueprint models are more accurate compared to those obtained from LLM-based pipelines lacking a planning component.
- [986] arXiv:2404.04292 (replaced) [pdf, ps, other]
-
Title: Conversational Disease Diagnosis via External Planner-Controlled Large Language ModelsComments: Work in ProgressSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The development of large language models (LLMs) has brought unprecedented possibilities for artificial intelligence (AI) based medical diagnosis. However, the application perspective of LLMs in real diagnostic scenarios is still unclear because they are not adept at collecting patient data proactively. This study presents a LLM-based diagnostic system that enhances planning capabilities by emulating doctors. Our system involves two external planners to handle planning tasks. The first planner employs a reinforcement learning approach to formulate disease screening questions and conduct initial diagnoses. The second planner uses LLMs to parse medical guidelines and conduct differential diagnoses. By utilizing real patient electronic medical record data, we constructed simulated dialogues between virtual patients and doctors and evaluated the diagnostic abilities of our system. We demonstrate that our system significantly surpasses existing models, including GPT-4 Turbo, in both disease screening and differential diagnoses. This research represents a step towards more seamlessly integrating AI into clinical settings, potentially enhancing the accuracy and accessibility of medical diagnostics.
- [987] arXiv:2404.05365 (replaced) [pdf, ps, html, other]
-
Title: NLP Progress in Indigenous Latin American LanguagesAtnafu Lambebo Tonja, Fazlourrahman Balouchzahi, Sabur Butt, Olga Kolesnikova, Hector Ceballos, Alexander Gelbukh, Thamar SolorioComments: Accepted at NAACL 2024Subjects: Computation and Language (cs.CL)
The paper focuses on the marginalization of indigenous language communities in the face of rapid technological advancements. We highlight the cultural richness of these languages and the risk they face of being overlooked in the realm of Natural Language Processing (NLP). We aim to bridge the gap between these communities and researchers, emphasizing the need for inclusive technological advancements that respect indigenous community perspectives. We show the NLP progress of indigenous Latin American languages and the survey that covers the status of indigenous languages in Latin America, their representation in NLP, and the challenges and innovations required for their preservation and development. The paper contributes to the current literature in understanding the need and progress of NLP for indigenous communities of Latin America, specifically low-resource and indigenous communities in general.
- [988] arXiv:2404.05576 (replaced) [pdf, ps, other]
-
Title: Dynamic Backtracking in GFlowNets: Enhancing Decision Steps with Reward-Dependent Adjustment MechanismsSubjects: Machine Learning (cs.LG)
Generative Flow Networks (GFlowNets or GFNs) are probabilistic models predicated on Markov flows, and they employ specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, etc. With a strong ability to generate high-performance biochemical molecules, GFNs accelerate the discovery of scientific substances, effectively overcoming the time-consuming, labor-intensive, and costly shortcomings of conventional material discovery methods. However, previous studies rarely focus on accumulating exploratory experience by adjusting generative structures, which leads to disorientation in complex sampling spaces. Efforts to address this issue, such as LS-GFN, are limited to local greedy searches and lack broader global adjustments. This paper introduces a novel variant of GFNs, the Dynamic Backtracking GFN (DB-GFN), which improves the adaptability of decision-making steps through a reward-based dynamic backtracking mechanism. DB-GFN allows backtracking during the network construction process according to the current state's reward value, thereby correcting disadvantageous decisions and exploring alternative pathways during the exploration process. When applied to generative tasks involving biochemical molecules and genetic material sequences, DB-GFN outperforms GFN models such as LS-GFN and GTB, as well as traditional reinforcement learning methods, in sample quality, sample exploration quantity, and training convergence speed. Additionally, owing to its orthogonal nature, DB-GFN shows great potential in future improvements of GFNs, and it can be integrated with other strategies to achieve higher search performance.
- [989] arXiv:2404.06347 (replaced) [pdf, ps, other]
-
Title: RAR-b: Reasoning as Retrieval BenchmarkComments: v2, small typo fixesSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Semantic textual similartiy (STS) and information retrieval tasks (IR) tasks have been the two major avenues to record the progress of embedding models in the past few years. Under the emerging Retrieval-augmented Generation (RAG) paradigm, we envision the need to evaluate next-level language understanding abilities of embedding models, and take a conscious look at the reasoning abilities stored in them. Addressing this, we pose the question: Can retrievers solve reasoning problems? By transforming reasoning tasks into retrieval tasks, we find that without specifically trained for reasoning-level language understanding, current state-of-the-art retriever models may still be far from being competent for playing the role of assisting LLMs, especially in reasoning-intensive tasks. Moreover, albeit trained to be aware of instructions, instruction-aware IR models are often better off without instructions in inference time for reasoning tasks, posing an overlooked retriever-LLM behavioral gap for the research community to align. However, recent decoder-based embedding models show great promise in narrowing the gap, highlighting the pathway for embedding models to achieve reasoning-level language understanding. We also show that, although current off-the-shelf re-ranker models fail on these tasks, injecting reasoning abilities into them through fine-tuning still appears easier than doing so to bi-encoders, and we are able to achieve state-of-the-art performance across all tasks by fine-tuning a reranking model. We release Reasoning as Retrieval Benchmark (RAR-b), a holistic suite of tasks and settings to evaluate the reasoning abilities stored in retriever models. RAR-b is available at this https URL.
- [990] arXiv:2404.06570 (replaced) [pdf, ps, html, other]
-
Title: MORPHeus: a Multimodal One-armed Robot-assisted Peeling System with Human Users In-the-loopSubjects: Robotics (cs.RO)
Meal preparation is an important instrumental activity of daily living~(IADL). While existing research has explored robotic assistance in meal preparation tasks such as cutting and cooking, the crucial task of peeling has received less attention. Robot-assisted peeling, conventionally a bimanual task, is challenging to deploy in the homes of care recipients using two wheelchair-mounted robot arms due to ergonomic and transferring challenges. This paper introduces a robot-assisted peeling system utilizing a single robotic arm and an assistive cutting board, inspired by the way individuals with one functional hand prepare meals. Our system incorporates a multimodal active perception module to determine whether an area on the food is peeled, a human-in-the-loop long-horizon planner to perform task planning while catering to a user's preference for peeling coverage, and a compliant controller to peel the food items. We demonstrate the system on 12 food items representing the extremes of different shapes, sizes, skin thickness, surface textures, skin vs flesh colors, and deformability.
- [991] arXiv:2404.07932 (replaced) [pdf, ps, other]
-
Title: FusionMamba: Efficient Image Fusion with State Space ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion primarily rely on CNNs or Transformers to extract features and merge different types of data. While CNNs are efficient, their receptive fields are limited, restricting their capacity to capture global context. Conversely, Transformers excel at learning global information but are hindered by their quadratic complexity. Fortunately, recent advancements in the State Space Model (SSM), particularly Mamba, offer a promising solution to this issue by enabling global awareness with linear complexity. However, there have been few attempts to explore the potential of the SSM in information fusion, which is a crucial ability in domains like image fusion. Therefore, we propose FusionMamba, an innovative method for efficient image fusion. Our contributions mainly focus on two aspects. Firstly, recognizing that images from different sources possess distinct properties, we incorporate Mamba blocks into two U-shaped networks, presenting a novel architecture that extracts spatial and spectral features in an efficient, independent, and hierarchical manner. Secondly, to effectively combine spatial and spectral information, we extend the Mamba block to accommodate dual inputs. This expansion leads to the creation of a new module called the FusionMamba block, which outperforms existing fusion techniques such as concatenation and cross-attention. We conduct a series of experiments on five datasets related to three image fusion tasks. The quantitative and qualitative evaluation results demonstrate that our method achieves SOTA performance, underscoring the superiority of FusionMamba. The code is available at this https URL.
- [992] arXiv:2404.07982 (replaced) [pdf, ps, other]
-
Title: Language Imbalance Can Boost Cross-lingual GeneralisationSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Multilinguality is crucial for extending recent advancements in language modelling to diverse linguistic communities. To maintain high performance while representing multiple languages, multilingual models ideally align representations, allowing what is learned in one language to generalise to others. Prior research has emphasised the importance of parallel data and shared vocabulary elements as key factors for such alignment. In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance. In controlled experiments on perfectly equivalent cloned languages, we observe that the existence of a predominant language during training boosts the performance of less frequent languages and leads to stronger alignment of model representations across languages. Furthermore, we find that this trend is amplified with scale: with large enough models or long enough training, we observe that bilingual training data with a 90/10 language split yields better performance on both languages than a balanced 50/50 split. Building on these insights, we design training schemes that can improve performance in all cloned languages, even without altering the training data. As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
- [993] arXiv:2404.09421 (replaced) [pdf, ps, html, other]
-
Title: Two methods addressing variable-exponent fractional initial and boundary value problems and Abel integral equationSubjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Variable-exponent fractional models attract increasing attentions in various applications, while the rigorous analysis is far from well developed. This work provides general tools to address these models. Specifically, we first develop a convolution method to study the well-posedness, regularity, an inverse problem and numerical approximation for the sundiffusion of variable exponent. For models such as the variable-exponent two-sided space-fractional boundary value problem (including the variable-exponent fractional Laplacian equation as a special case) and the distributed variable-exponent model, for which the convolution method does not apply, we develop a perturbation method to prove their well-posedness. The relation between the convolution method and the perturbation method is discussed, and we further apply the latter to prove the well-posedness of the variable-exponent Abel integral equation and discuss the constraint on the data under different initial values of variable exponent.
- [994] arXiv:2404.09957 (replaced) [pdf, ps, other]
-
Title: How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything ModelComments: Code available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Automated segmentation is a fundamental medical image analysis task, which enjoys significant advances due to the advent of deep learning. While foundation models have been useful in natural language processing and some vision tasks for some time, the foundation model developed with image segmentation in mind - Segment Anything Model (SAM) - has been developed only recently and has shown similar promise. However, there are still no systematic analyses or "best-practice" guidelines for optimal fine-tuning of SAM for medical image segmentation. This work summarizes existing fine-tuning strategies with various backbone architectures, model components, and fine-tuning algorithms across 18 combinations, and evaluates them on 17 datasets covering all common radiology modalities. Our study reveals that (1) fine-tuning SAM leads to slightly better performance than previous segmentation methods, (2) fine-tuning strategies that use parameter-efficient learning in both the encoder and decoder are superior to other strategies, (3) network architecture has a small impact on final performance, (4) further training SAM with self-supervised learning can improve final model performance. We also demonstrate the ineffectiveness of some methods popular in the literature and further expand our experiments into few-shot and prompt-based settings. Lastly, we released our code and MRI-specific fine-tuned weights, which consistently obtained superior performance over the original SAM, at this https URL.
- [995] arXiv:2404.10281 (replaced) [pdf, ps, other]
-
Title: AI-Assisted Writing in Education: Ecosystem Risks and MitigationsSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
While the excitement around the capabilities of technological advancements is giving rise to new AI-based writing assistants, the overarching ecosystem plays a crucial role in how they are adopted in educational practice. In this paper, we point to key ecological aspects for consideration. We draw insights from extensive research integrated with practice on a writing feedback tool over 9 years at a university, and we highlight potential risks when these are overlooked. It informs the design of educational writing support tools to be better aligned within broader contexts to balance innovation with practical impact.
- [996] arXiv:2404.10664 (replaced) [pdf, ps, other]
-
Title: Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification TasksComments: 13 pages, 12 figures, 13th International conference on innovative technologies in the field of science, engineering and technologySubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Images captured from the real world are often affected by different types of noise, which can significantly impact the performance of Computer Vision systems and the quality of visual data. This study presents a novel approach for defect detection in casting product noisy images, specifically focusing on submersible pump impellers. The methodology involves utilizing deep learning models such as VGG16, InceptionV3, and other models in both the spatial and frequency domains to identify noise types and defect status. The research process begins with preprocessing images, followed by applying denoising techniques tailored to specific noise categories. The goal is to enhance the accuracy and robustness of defect detection by integrating noise detection and denoising into the classification pipeline. The study achieved remarkable results using VGG16 for noise type classification in the frequency domain, achieving an accuracy of over 99%. Removal of salt and pepper noise resulted in an average SSIM of 87.9, while Gaussian noise removal had an average SSIM of 64.0, and periodic noise removal yielded an average SSIM of 81.6. This comprehensive approach showcases the effectiveness of the deep AutoEncoder model and median filter, for denoising strategies in real-world industrial applications. Finally, our study reports significant improvements in binary classification accuracy for defect detection compared to previous methods. For the VGG16 classifier, accuracy increased from 94.6% to 97.0%, demonstrating the effectiveness of the proposed noise detection and denoising approach. Similarly, for the InceptionV3 classifier, accuracy improved from 84.7% to 90.0%, further validating the benefits of integrating noise analysis into the classification pipeline.
- [997] arXiv:2404.10703 (replaced) [pdf, ps, html, other]
-
Title: An Empirical Study on Code Review Activity Prediction and Its Impact in PracticeComments: 20 pages + 3 pages refJournal-ref: FSE 2024Subjects: Software Engineering (cs.SE)
During code reviews, an essential step in software quality assurance, reviewers have the difficult task of understanding and evaluating code changes to validate their quality and prevent introducing faults to the codebase. This is a tedious process where the effort needed is highly dependent on the code submitted, as well as the author's and the reviewer's experience, leading to median wait times for review feedback of 15-64 hours. Through an initial user study carried with 29 experts, we found that re-ordering the files changed by a patch within the review environment has potential to improve review quality, as more comments are written (+23%), and participants' file-level hot-spot precision and recall increases to 53% (+13%) and 28% (+8%), respectively, compared to the alphanumeric ordering. Hence, this paper aims to help code reviewers by predicting which files in a submitted patch need to be (1) commented, (2) revised, or (3) are hot-spots (commented or revised). To predict these tasks, we evaluate two different types of text embeddings (i.e., Bag-of-Words and Large Language Models encoding) and review process features (i.e., code size-based and history-based features). Our empirical study on three open-source and two industrial datasets shows that combining the code embedding and review process features leads to better results than the state-of-the-art approach. For all tasks, F1-scores (median of 40-62%) are significantly better than the state-of-the-art (from +1 to +9%).
- [998] arXiv:2404.10976 (replaced) [pdf, ps, other]
-
Title: Group-Aware Coordination Graph for Multi-Agent Reinforcement LearningComments: Accepted by IJCAI 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.
- [999] arXiv:2404.11003 (replaced) [pdf, ps, html, other]
-
Title: InfoMatch: Entropy Neural Estimation for Semi-Supervised Image ClassificationComments: IJCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Semi-supervised image classification, leveraging pseudo supervision and consistency regularization, has demonstrated remarkable success. However, the ongoing challenge lies in fully exploiting the potential of unlabeled data. To address this, we employ information entropy neural estimation to utilize the potential of unlabeled samples. Inspired by contrastive learning, the entropy is estimated by maximizing a lower bound on mutual information across different augmented views. Moreover, we theoretically analyze that the information entropy of the posterior of an image classifier is approximated by maximizing the likelihood function of the softmax predictions. Guided by these insights, we optimize our model from both perspectives to ensure that the predicted probability distribution closely aligns with the ground-truth distribution. Given the theoretical connection to information entropy, we name our method InfoMatch. Through extensive experiments, we show its superior performance. The source code is available at this https URL.
- [1000] arXiv:2404.11097 (replaced) [pdf, ps, other]
-
Title: Optimum Achievable Rates in Two Random Number Generation Problems with $f$-Divergences Using Smooth R\'enyi EntropySubjects: Information Theory (cs.IT)
Two typical fixed-length random number generation problems in information theory are considered for general sources. One is the source resolvability problem and the other is the intrinsic randomness problem. In each of these problems, the optimum achievable rate with respect to the given approximation measure is one of our main concerns and has been characterized using two different information quantities: the information spectrum and the smooth Rényi entropy. Recently, optimum achievable rates with respect to $f$-divergences have been characterized using the information spectrum quantity. The $f$-divergence is a general non-negative measure between two probability distributions on the basis of a convex function $f$. The class of f-divergences includes several important measures such as the variational distance, the KL divergence, the Hellinger distance and so on. Hence, it is meaningful to consider the random number generation problems with respect to $f$-divergences. However, optimum achievable rates with respect to $f$-divergences using the smooth Rényi entropy have not been clarified yet in both of two problems. In this paper we try to analyze the optimum achievable rates using the smooth Rényi entropy and to extend the class of $f$-divergence. To do so, we first derive general formulas of the first-order optimum achievable rates with respect to $f$-divergences in both problems under the same conditions as imposed by previous studies. Next, we relax the conditions on $f$-divergence and generalize the obtained general formulas. Then, we particularize our general formulas to several specified functions $f$. As a result, we reveal that it is easy to derive optimum achievable rates for several important measures from our general formulas. Furthermore, a kind of duality between the resolvability and the intrinsic randomness is revealed in terms of the smooth Rényi entropy.