Human-Computer Interaction
See recent articles
- [1] arXiv:2407.00305 [pdf, html, other]
-
Title: Student-AI Interaction: A Case Study of CS1 studentsMatin Amoozadeh, Daye Nam, Daniel Prol, Ali Alfageeh, James Prather, Michael Hilton, Sruti Srinivasa Ragavan, Mohammad Amin AlipourSubjects: Human-Computer Interaction (cs.HC)
The new capabilities of generative artificial intelligence tools Generative AI, such as ChatGPT, allow users to interact with the system in intuitive ways, such as simple conversations, and receive (mostly) good-quality answers. These systems can support students' learning objectives by providing accessible explanations and examples even with vague queries. At the same time, they can encourage undesired help-seeking behaviors by providing solutions to the students' homework. Therefore, it is important to better understand how students approach such tools and the potential issues such approaches might present for the learners. In this paper, we present a case study for understanding student-AI collaboration to solve programming tasks in the CS1 introductory programming course. To this end, we recruited a gender-balanced majority non-white set of 15 CS1 students at a large public university in the US. We observed them solving programming tasks. We used a mixed-method approach to study their interactions as they tackled Python programming tasks, focusing on when and why they used ChatGPT for problem-solving. We analyze and classify the questions submitted by the 15 participants to ChatGPT. Additionally, we analyzed user interaction patterns, their reactions to ChatGPT's responses, and the potential impacts of Generative AI on their perception of self-efficacy. Our results suggest that in about a third of the cases, the student attempted to complete the task by submitting the full description of the tasks to ChatGPT without making any effort on their own. We also observed that few students verified their solutions. We discuss the results and their potential implications.
- [2] arXiv:2407.00981 [pdf, html, other]
-
Title: VisEval: A Benchmark for Data Visualization in the Era of Large Language ModelsSubjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple dimensions, including validity, legality, and readability. By systematically scanning for potential issues with a number of heterogeneous checkers, VisEval provides reliable and trustworthy evaluation outcomes. We run VisEval on a series of state-of-the-art LLMs. Our evaluation reveals prevalent challenges and delivers essential insights for future advancements.
- [3] arXiv:2407.01077 [pdf, html, other]
-
Title: Impact of Social Relationships on Peer Assessment in E-LearningComments: 24 pages, 5 figures, 4 tables. Learning Environ Res (2024)Subjects: Human-Computer Interaction (cs.HC)
Peer assessment has been widely studied as a replacement for traditional evaluation, not only by reducing the professors' workload but mainly by benefiting students' engagement and learning. Although several works successfully validate its accuracy and fairness, more research must be done on how students' pre-existing social relationships affect the grades they give their peers in an e-learning course. We developed a Moodle plugin to provide the platform with peer assessment capabilities in forums and used it on an MSc course. The plugin curated the reviewer set for a post based on the author's relationships and included rubrics to counter the possible interpersonal effects of peer assessment. Results confirm that peer assessment is reliable and accurate for works with at least three peer assessments, although students' grades are slightly higher. The impact of social relationships is noticeable when students who do not like another peer grade their work consistently lower than students who have a positive connection. However, this has little influence on the final aggregate peer grade. Our findings show that peer assessment can replace traditional evaluation in an e-learning environment where students are familiar with each other.
- [4] arXiv:2407.01161 [pdf, html, other]
-
Title: GazeNoter: Co-Piloted AR Note-Taking via Gaze Selection of LLM Suggestions to Match Users' IntentionsComments: 18 pages, 10 figuresSubjects: Human-Computer Interaction (cs.HC)
Note-taking is critical during speeches and discussions, serving not only for later summarization and organization but also for real-time question and opinion reminding in question-and-answer sessions or timely contributions in discussions. Manually typing on smartphones for note-taking could be distracting and increase cognitive load for users. While large language models (LLMs) are used to automatically generate summaries and highlights, the content generated by artificial intelligence (AI) may not match users' intentions without user input or interaction. Therefore, we propose an AI-copiloted augmented reality (AR) system, GazeNoter, to allow users to swiftly select diverse LLM-generated suggestions via gaze on an AR headset for real-time note-taking. GazeNoter leverages an AR headset as a medium for users to swiftly adjust the LLM output to match their intentions, forming a user-in-the-loop AI system for both within-context and beyond-context notes. We conducted two user studies to verify the usability of GazeNoter in attending speeches in a static sitting condition and walking meetings and discussions in a mobile walking condition, respectively.
- [5] arXiv:2407.01488 [pdf, html, other]
-
Title: LEXI: Large Language Models Experimentation InterfaceComments: For associated Github repository, see this https URLSubjects: Human-Computer Interaction (cs.HC)
The recent developments in Large Language Models (LLM), mark a significant moment in the research and development of social interactions with artificial agents. These agents are widely deployed in a variety of settings, with potential impact on users. However, the study of social interactions with agents powered by LLM is still emerging, limited by access to the technology and to data, the absence of standardised interfaces, and challenges to establishing controlled experimental setups using the currently available business-oriented platforms. To answer these gaps, we developed LEXI, LLMs Experimentation Interface, an open-source tool enabling the deployment of artificial agents powered by LLM in social interaction behavioural experiments. Using a graphical interface, LEXI allows researchers to build agents, and deploy them in experimental setups along with forms and questionnaires while collecting interaction logs and self-reported data. %LEXI is aimed at improving human-agent interaction (HAI) empirical research methodology while allowing researchers with diverse backgrounds and technical proficiency to deploy artificial agents powered by LLM in HAI behavioural experiments. The outcomes of usability testing indicate LEXI's broad utility, high usability and minimum mental workload requirement, with distinctive benefits observed across disciplines. A proof-of-concept study exploring the tool's efficacy in evaluating social HAIs was conducted, resulting in high-quality data. A comparison of empathetic versus neutral agents indicated that people perceive empathetic agents as more social, and write longer and more positive messages towards them.
New submissions for Tuesday, 2 July 2024 (showing 5 of 5 entries )
- [6] arXiv:2407.00039 (cross-list from q-bio.NC) [pdf, other]
-
Title: Decoding moral judgement from text: a pilot studyComments: 7 pages, 2 figures, conferenceSubjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Moral judgement is a complex human reaction that engages cognitive and emotional dimensions. While some of the morality neural correlates are known, it is currently unclear if we can detect moral violation at a single-trial level. In a pilot study, here we explore the feasibility of moral judgement decoding from text stimuli with passive brain-computer interfaces. For effective moral judgement elicitation, we use video-audio affective priming prior to text stimuli presentation and attribute the text to moral agents. Our results show that further efforts are necessary to achieve reliable classification between moral congruency vs. incongruency states. We obtain good accuracy results for neutral vs. morally-charged trials. With this research, we try to pave the way towards neuroadaptive human-computer interaction and more human-compatible large language models (LLMs)
- [7] arXiv:2407.00108 (cross-list from cs.LG) [pdf, html, other]
-
Title: A Case Study on Contextual Machine Translation in a Professional Scenario of SubtitlingComments: Accepted to EAMT 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Incorporating extra-textual context such as film metadata into the machine translation (MT) pipeline can enhance translation quality, as indicated by automatic evaluation in recent work. However, the positive impact of such systems in industry remains unproven. We report on an industrial case study carried out to investigate the benefit of MT in a professional scenario of translating TV subtitles with a focus on how leveraging extra-textual context impacts post-editing. We found that post-editors marked significantly fewer context-related errors when correcting the outputs of MTCue, the context-aware model, as opposed to non-contextual models. We also present the results of a survey of the employed post-editors, which highlights contextual inadequacy as a significant gap consistently observed in MT. Our findings strengthen the motivation for further work within fully contextual MT.
- [8] arXiv:2407.00129 (cross-list from eess.IV) [pdf, other]
-
Title: Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath PredictionComments: Submitted to the JournalSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Predicting human gaze behavior within computer vision is integral for developing interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models to medical imaging for scanpath prediction remains unexplored. Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images, potentially streamlining data collection and enhancing AI systems using larger datasets. However, predicting human scanpaths on medical images presents unique challenges due to the diverse nature of abnormal regions. Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community. Utilizing a two-stage training process and large publicly available datasets, our approach generates static heatmaps and eye gaze videos aligned with radiology reports, facilitating comprehensive analysis. We validate our approach by comparing its performance with state-of-the-art methods and assessing its generalizability among different radiologists, introducing novel strategies to model radiologists' search patterns during CXR image diagnosis. Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions over the CXR images. It sometimes also outperforms humans in terms of redundancy and randomness in the scanpaths.
- [9] arXiv:2407.00167 (cross-list from cs.CL) [pdf, html, other]
-
Title: Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation ApproachSai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming HuangComments: Accepted for the AI Applications in Public Health and Social Services workshop at the 22nd International Conference on Artificial Intelligence in Medicine (AIME 2024)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI)
In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one vaping sub-community on Reddit to analyze users' quit-vaping intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit vaping intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection.
- [10] arXiv:2407.00233 (cross-list from cs.CV) [pdf, other]
-
Title: Methodology to Deploy CNN-Based Computer Vision Models on Immersive Wearable DevicesKaveh Malek (1), Fernando Moreu (2), ((1) Department of Mechanical Engineering, University of New Mexico, New Mexico, (2) Department of Civil, Construction and Environmental Engineering, University of New Mexico, New Mexico)Comments: 10 pages 8 figures 4300 wordsSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Convolutional Neural Network (CNN) models often lack the ability to incorporate human input, which can be addressed by Augmented Reality (AR) headsets. However, current AR headsets face limitations in processing power, which has prevented researchers from performing real-time, complex image recognition tasks using CNNs in AR headsets. This paper presents a method to deploy CNN models on AR headsets by training them on computers and transferring the optimized weight matrices to the headset. The approach transforms the image data and CNN layers into a one-dimensional format suitable for the AR platform. We demonstrate this method by training the LeNet-5 CNN model on the MNIST dataset using PyTorch and deploying it on a HoloLens AR headset. The results show that the model maintains an accuracy of approximately 98%, similar to its performance on a computer. This integration of CNN and AR enables real-time image processing on AR headsets, allowing for the incorporation of human input into AI models.
- [11] arXiv:2407.00463 (cross-list from cs.LG) [pdf, html, other]
-
Title: Open-Source Conversational AI with SpeechBrain 1.0Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar, Jarod Duret, Salima Mdhaffar, Gaelle Laperriere, Renato De Mori, Yannick EsteveComments: Submitted to JMLR (Machine Learning Open Source Software)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much this http URL promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.
- [12] arXiv:2407.00870 (cross-list from cs.CL) [pdf, html, other]
-
Title: Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to PrinciplesRyan Louie (1), Ananjan Nandi (1), William Fang (1), Cheng Chang (1), Emma Brunskill (1), Diyi Yang (1) ((1) Stanford University)Comments: 34 pages, 24 figures, 11 TablesSubjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay. We apply this pipeline to enable senior mental health supporters to create customized AI patients for simulated practice partners for novice counselors. After uncovering issues in GPT-4 simulations not adhering to expert-defined principles, we also introduce a novel principle-adherence prompting pipeline which shows 30\% improvements in response quality and principle following for the downstream task. Via a user study with 25 counseling experts, we demonstrate that the pipeline makes it easy and effective to create AI patients that more faithfully resemble real patients, as judged by creators and third-party counselors.
- [13] arXiv:2407.01067 (cross-list from cs.AI) [pdf, html, other]
-
Title: Human-like object concept representations emerge naturally in multimodal large language modelsChangde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang HeSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vast amounts of linguistic and multimodal data. In this study, we combined behavioral and neuroimaging analysis methods to uncover how the object concept representations in LLMs correlate with those of humans. By collecting large-scale datasets of 4.7 million triplet judgments from LLM and Multimodal LLM (MLLM), we were able to derive low-dimensional embeddings that capture the underlying similarity structure of 1,854 natural objects. The resulting 66-dimensional embeddings were found to be highly stable and predictive, and exhibited semantic clustering akin to human mental representations. Interestingly, the interpretability of the dimensions underlying these embeddings suggests that LLM and MLLM have developed human-like conceptual representations of natural objects. Further analysis demonstrated strong alignment between the identified model embeddings and neural activity patterns in many functionally defined brain ROIs (e.g., EBA, PPA, RSC and FFA). This provides compelling evidence that the object representations in LLMs, while not identical to those in the human, share fundamental commonalities that reflect key schemas of human conceptual knowledge. This study advances our understanding of machine intelligence and informs the development of more human-like artificial cognitive systems.
Cross submissions for Tuesday, 2 July 2024 (showing 8 of 8 entries )
- [14] arXiv:2401.01818 (replaced) [pdf, html, other]
-
Title: SENS3: Multisensory Database of Finger-Surface Interactions and Corresponding SensationsComments: 15 pages, 3 table, 3 figures, conferenceSubjects: Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
The growing demand for natural interactions with technology underscores the importance of achieving realistic touch sensations in digital environments. Realizing this goal highly depends on comprehensive databases of finger-surface interactions, which need further development. Here, we present SENS3 -- this http URL -- an extensive open-access repository of multisensory data acquired from fifty surfaces when two participants explored them with their fingertips through static contact, pressing, tapping, and sliding. SENS3 encompasses high-fidelity visual, audio, and haptic information recorded during these interactions, including videos, sounds, contact forces, torques, positions, accelerations, skin temperature, heat flux, and surface photographs. Additionally, it incorporates thirteen participants' psychophysical sensation ratings (rough-smooth, flat-bumpy, sticky-slippery, hot-cold, regular-irregular, fine-coarse, hard-soft, and wet-dry) while exploring these surfaces freely. Designed with an open-ended framework, SENS3 has the potential to be expanded with additional textures and participants. We anticipate that SENS3 will be valuable for advancing multisensory texture rendering, user experience development, and touch sensing in robotics.
- [15] arXiv:2402.16795 (replaced) [pdf, html, other]
-
Title: If in a Crowdsourced Data Annotation Pipeline, a GPT-4Comments: Accepted By CHI 2024Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Recent studies indicated GPT-4 outperforms online crowd workers in data labeling accuracy, notably workers from Amazon Mechanical Turk (MTurk). However, these studies were criticized for deviating from standard crowdsourcing practices and emphasizing individual workers' performances over the whole data-annotation process. This paper compared GPT-4 and an ethical and well-executed MTurk pipeline, with 415 workers labeling 3,177 sentence segments from 200 scholarly articles using the CODA-19 scheme. Two worker interfaces yielded 127,080 labels, which were then used to infer the final labels through eight label-aggregation algorithms. Our evaluation showed that despite best practices, MTurk pipeline's highest accuracy was 81.5%, whereas GPT-4 achieved 83.6%. Interestingly, when combining GPT-4's labels with crowd labels collected via an advanced worker interface for aggregation, 2 out of the 8 algorithms achieved an even higher accuracy (87.5%, 87.0%). Further analysis suggested that, when the crowd's and GPT-4's labeling strengths are complementary, aggregating them could increase labeling accuracy.
- [16] arXiv:2404.14511 (replaced) [pdf, other]
-
Title: Children's Overtrust and Shifting Perspectives of Generative AIJournal-ref: Proceedings of the 18th International Scoeity of the Learning Sciences (ICLS) 2024Subjects: Human-Computer Interaction (cs.HC)
The capabilities of generative AI (genAI) have dramatically increased in recent times, and there are opportunities for children to leverage new features for personal and school-related endeavors. However, while the future of genAI is taking form, there remain potentially harmful limitations, such as generation of outputs with misinformation and bias. We ran a workshop study focused on ChatGPT to explore middle school girls' (N = 26) attitudes and reasoning about how genAI works. We focused on girls who are often disproportionately impacted by algorithmic bias. We found that: (1) middle school girls were initially overtrusting of genAI, (2) deliberate exposure to the limitations and mistakes of generative AI shifted this overtrust to disillusionment about genAI capabilities, though they were still optimistic for future possibilities of genAI, and (3) their ideas about school policy were nuanced. This work informs how children think about genAI like ChatGPT and its integration in learning settings.
- [17] arXiv:2404.17602 (replaced) [pdf, html, other]
-
Title: A Methodology and System For Big-Thick Data CollectionComments: 8 pages, 3 figures, accepted by Aduous workshopSubjects: Human-Computer Interaction (cs.HC)
Pervasive sensors have become essential in research for gathering real-world data. However, current studies often focus solely on objective data, neglecting subjective human contributions. We introduce an approach and system for collecting big-thick data, combining extensive sensor data (big data) with qualitative human feedback (thick data). This fusion enables effective collaboration between humans and machines, allowing machine learning to benefit from human behavior and interpretations. Emphasizing data quality, our system incorporates continuous monitoring and adaptive learning mechanisms to optimize data collection timing and context, ensuring relevance, accuracy, and reliability. The system comprises three key components: a) a tool for collecting sensor data and user feedback, b) components for experiment planning and execution monitoring, and c) a machine-learning component that enhances human-machine interaction.
- [18] arXiv:2406.19663 (replaced) [pdf, other]
-
Title: Aerial Push-Button with Two-Stage Tactile Feedback using Reflected Airborne Ultrasound FocusComments: 9 pages, 15 figures, original manuscript edited by Microsoft WordSubjects: Human-Computer Interaction (cs.HC); Emerging Technologies (cs.ET)
We developed a new aerial push-button with tactile feedback using focused airborne ultrasound. This study has two significant novelties compared to past related studies: 1) ultrasound emitters are equipped behind the user's finger and reflected ultrasound emission that is focused just above the solid plane placed under the finger presents tactile feedback to a finger pad, and 2) tactile feedback is presented at two stages during pressing motion; at the time of pushing the button and withdrawing the finger from it. The former has a significant advantage in apparatus implementation in that the input surface of the device can be composed of a generic thin plane including touch panels, potentially capable of presenting input touch feedback only when the user touches objects on the screen. We experimentally found that the two-stage tactile presentation is much more effective in strengthening perceived tactile stimulation and feeling of input completion when compared with a conventional single-stage method. This study proposes a composition of an aerial push-button in much more practical use than ever. The proposed system composition is expected to be one of the simplest frameworks in the airborne ultrasound tactile interface.
- [19] arXiv:2303.14007 (replaced) [pdf, other]
-
Title: 'Team-in-the-loop': Ostrom's IAD framework 'rules in use' to map and measure contextual impacts of AIComments: 19 pagesSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
This article explores how the 'rules in use' from Ostrom's Institutional Analysis and Development Framework (IAD) can be developed as a context analysis approach for AI. AI risk assessment frameworks increasingly highlight the need to understand existing contexts. However, these approaches do not frequently connect with established institutional analysis scholarship. We outline a novel direction illustrated through a high-level example to understand how clinical oversight is potentially impacted by AI. Much current thinking regarding oversight for AI revolves around the idea of decision makers being in-the-loop and, thus, having capacity to intervene to prevent harm. However, our analysis finds that oversight is complex, frequently made by teams of professionals and relies upon explanation to elicit information. Professional bodies and liability also function as institutions of polycentric oversight. These are all impacted by the challenge of oversight of AI systems. The approach outlined has potential utility as a policy tool of context analysis aligned with the 'Govern and Map' functions of the National Institute of Standards and Technology (NIST) AI Risk Management Framework; however, further empirical research is needed. Our analysis illustrates the benefit of existing institutional analysis approaches in foregrounding team structures within oversight and, thus, in conceptions of 'human in the loop'.
- [20] arXiv:2309.05196 (replaced) [pdf, html, other]
-
Title: Does Writing with Language Models Reduce Content Diversity?Comments: ICLR 2024Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Large language models (LLMs) have led to a surge in collaborative writing with model assistance. As different users incorporate suggestions from the same model, there is a risk of decreased diversity in the produced content, potentially limiting diverse perspectives in public discourse. In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups -- using a base LLM (GPT3), a feedback-tuned LLM (InstructGPT), and writing without model help. We develop a set of diversity metrics and find that writing with InstructGPT (but not the GPT3) results in a statistically significant reduction in diversity. Specifically, it increases the similarity between the writings of different authors and reduces the overall lexical and content diversity. We additionally find that this effect is mainly attributable to InstructGPT contributing less diverse text to co-written essays. In contrast, the user-contributed text remains unaffected by model collaboration. This suggests that the recent improvement in generation quality from adapting models to human feedback might come at the cost of more homogeneous and less diverse content.
- [21] arXiv:2402.14601 (replaced) [pdf, html, other]
-
Title: Bringing Generative AI to Adaptive Learning in EducationHang Li, Tianlong Xu, Chaoli Zhang, Eason Chen, Jing Liang, Xing Fan, Haoyang Li, Jiliang Tang, Qingsong WenComments: 14 pages, 7 figuresSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
The recent surge in generative AI technologies, such as large language models and diffusion models, has boosted the development of AI applications in various domains, including science, finance, and education. Concurrently, adaptive learning, a concept that has gained substantial interest in the educational sphere, has proven its efficacy in enhancing students' learning efficiency. In this position paper, we aim to shed light on the intersectional studies of these two methods, which combine generative AI with adaptive learning concepts. By presenting discussions about the benefits, challenges, and potentials in this field, we argue that this union will contribute significantly to the development of the next-stage learning format in education.
- [22] arXiv:2404.05317 (replaced) [pdf, html, other]
-
Title: WebXR, A-Frame and Networked-Aframe as a Basis for an Open Metaverse: A Conceptual ArchitectureComments: draftcls optionSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
This work proposes a WebXR-based cross-platform conceptual architecture, leveraging the A-Frame and Networked-Aframe frameworks, in order to facilitate the development of an open, accessible, and interoperable metaverse. By introducing the concept of spatial web app, this research contributes to the discourse on the metaverse, offering an architecture that democratizes access to virtual environments and extended reality through the web, and aligns with Tim Berners-Lee's original vision of the World Wide Web as an open platform in the digital realm.
- [23] arXiv:2405.01354 (replaced) [pdf, html, other]
-
Title: Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES)Bahar Irfan, Jura Miniota, Sofia Thunberg, Erik Lagerstedt, Sanna Kuoppamäki, Gabriel Skantze, André PereiraComments: Under review at Transactions on Affective Computing. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted componentSubjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
Understanding user enjoyment is crucial in human-robot interaction (HRI), as it can impact interaction quality and influence user acceptance and long-term engagement with robots, particularly in the context of conversations with social robots. However, current assessment methods rely solely on self-reported questionnaires, failing to capture interaction dynamics. This work introduces the Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES), a novel scale for assessing user enjoyment from an external perspective during conversations with a robot. Developed through rigorous evaluations and discussions of three annotators with relevant expertise, the scale provides a structured framework for assessing enjoyment in each conversation exchange (turn) alongside overall interaction levels. It aims to complement self-reported enjoyment from users and holds the potential for autonomously identifying user enjoyment in real-time HRI. The scale was validated on 25 older adults' open-domain dialogue with a companion robot that was powered by a large language model for conversations, corresponding to 174 minutes of data, showing moderate to good alignment. The dataset is available online. Additionally, the study offers insights into understanding the nuances and challenges of assessing user enjoyment in robot interactions, and provides guidelines on applying the scale to other domains.
- [24] arXiv:2406.14097 (replaced) [pdf, other]
-
Title: Enhancing the LLM-Based Robot Manipulation Through Human-Robot CollaborationHaokun Liu, Yaonan Zhu, Kenji Kato, Atsushi Tsukahara, Izumi Kondo, Tadayoshi Aoyama, Yasuhisa HasegawaComments: IEEE Robotics and Automation LettersJournal-ref: IEEE Robotics and Automation Letters, vol. 9, no. 8, pp. 6904-6911, Aug. 2024Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Large Language Models (LLMs) are gaining popularity in the field of robotics. However, LLM-based robots are limited to simple, repetitive motions due to the poor integration between language models, robots, and the environment. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot. The system also employs a YOLO-based perception algorithm, providing visual cues to the LLM, which aids in planning feasible motions within the specific environment. Additionally, an HRC method is proposed by combining teleoperation and Dynamic Movement Primitives (DMP), allowing the LLM-based robot to learn from human guidance. Real-world experiments have been conducted using the Toyota Human Support Robot for manipulation tasks. The outcomes indicate that tasks requiring complex trajectory planning and reasoning over environments can be efficiently accomplished through the incorporation of human demonstrations.
- [25] arXiv:2406.14250 (replaced) [pdf, html, other]
-
Title: E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTionComments: 9 pages, 5 figures, Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Online GUI navigation on mobile devices has driven a lot of attention recent years since it contributes to many real-world applications. With the rapid development of large language models (LLM), multimodal large language models (MLLM) have tremendous potential on this task. However, existing MLLMs need high quality data to improve its abilities of making the correct navigation decisions according to the human user inputs. In this paper, we developed a novel and highly valuable dataset, named \textbf{E-ANT}, as the first Chinese GUI navigation dataset that contains real human behaviour and high quality screenshots with annotations, containing nearly 40,000 real human traces over 5000+ different tinyAPPs. Furthermore, we evaluate various powerful MLLMs on E-ANT and show their experiments results with sufficient ablations. We believe that our proposed dataset will be beneficial for both the evaluation and development of GUI navigation and LLM/MLLM decision-making capabilities.
- [26] arXiv:2406.14485 (replaced) [pdf, other]
-
Title: Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zijin Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel VigliensoniSubjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.