Computers and Society
See recent articles
Showing new listings for Tuesday, 5 November 2024
- [1] arXiv:2411.00934 [pdf, other]
-
Title: Generative Memesis: AI Mediates Political Memes in the 2024 USA Presidential ElectionHo-Chun Herbert Chang, Benjamin Shaman, Yung-chun Chen, Mingyue Zha, Sean Noh, Chiyu Wei, Tracy Weener, Maya MageeSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Visual content on social media has become increasingly influential in shaping political discourse and civic engagement. Using a dataset of 239,526 Instagram images, deep learning, and LLM-based workflows, we examine the impact of different content types on user engagement during the 2024 US presidential Elections, with a focus on synthetic visuals. Results show while synthetic content may not increase engagement alone, it mediates how political information is created through highly effective, often absurd, political memes. We define the notion of generative memesis, where memes are no longer shared person-to-person but mediated by AI through customized, generated images. We also find partisan divergences: Democrats use AI for in-group support whereas Republicans use it for out-group attacks. Non-traditional, left-leaning outlets are the primary creators of political memes; emphasis on different topics largely follows issue ownership.
- [2] arXiv:2411.00986 [pdf, html, other]
-
Title: Taking AI Welfare SeriouslyRobert Long, Jeff Sebo, Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, David ChalmersSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood, i.e. of AI systems with their own interests and moral significance, is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early steps that AI companies and other actors can take: They can (1) acknowledge that AI welfare is an important and difficult issue (and ensure that language model outputs do the same), (2) start assessing AI systems for evidence of consciousness and robust agency, and (3) prepare policies and procedures for treating AI systems with an appropriate level of moral concern. To be clear, our argument in this report is not that AI systems definitely are, or will be, conscious, robustly agentic, or otherwise morally significant. Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not.
- [3] arXiv:2411.01057 [pdf, html, other]
-
Title: Online Moderation in Competitive Action Games: How Intervention Affects Player BehaviorsZhuofang Li, Rafal Kocielnik, Mitchell Linegar, Deshawn Sambrano, Fereshteh Soltani, Min Kim, Nabiha Naqvie, Grant Cahill, Animashree Anandkumar, R. Michael AlvarezSubjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Applications (stat.AP)
Online competitive action games have flourished as a space for entertainment and social connections, yet they face challenges from a small percentage of players engaging in disruptive behaviors. This study delves into the under-explored realm of understanding the effects of moderation on player behavior within online gaming on an example of a popular title - Call of Duty(R): Modern Warfare(R)II. We employ a quasi-experimental design and causal inference techniques to examine the impact of moderation in a real-world industry-scale moderation system. We further delve into novel aspects around the impact of delayed moderation, as well as the severity of applied punishment. We examine these effects on a set of four disruptive behaviors including cheating, offensive user name, chat, and voice. Our findings uncover the dual impact moderation has on reducing disruptive behavior and discouraging disruptive players from participating. We further uncover differences in the effectiveness of quick and delayed moderation and the varying severity of punishment. Our examination of real-world gaming interactions sets a precedent in understanding the effectiveness of moderation and its impact on player behavior. Our insights offer actionable suggestions for the most promising avenues for improving real-world moderation practices, as well as the heterogeneous impact moderation has on indifferent players.
- [4] arXiv:2411.01329 [pdf, html, other]
-
Title: Cloned Identity Detection in Social-Sensor Clouds based on Incomplete ProfilesComments: To appear on IEEE Transactions on Services ComputingSubjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
We propose a novel approach to effectively detect cloned identities of social-sensor cloud service providers (i.e. social media users) in the face of incomplete non-privacy-sensitive profile data. Named ICD-IPD, the proposed approach first extracts account pairs with similar usernames or screen names from a given set of user accounts collected from a social media. It then learns a multi-view representation associated with a given account and extracts two categories of features for every single account. These two categories of features include profile and Weighted Generalised Canonical Correlation Analysis (WGCCA)-based features that may potentially contain missing values. To counter the impact of such missing values, a missing value imputer will next impute the missing values of the aforementioned profile and WGCCA-based features. After that, the proposed approach further extracts two categories of augmented features for each account pair identified previously, namely, 1) similarity and 2) differences-based features. Finally, these features are concatenated and fed into a Light Gradient Boosting Machine classifier to detect identity cloning. We evaluated and compared the proposed approach against the existing state-of-the-art identity cloning approaches and other machine or deep learning models atop a real-world dataset. The experimental results show that the proposed approach outperforms the state-of-the-art approaches and models in terms of Precision, Recall and F1-score.
- [5] arXiv:2411.01337 [pdf, other]
-
Title: The Case for an Industrial Policy Approach to AI Sector of Pakistan for Growth and AutonomySubjects: Computers and Society (cs.CY)
This paper argues for the strategic treatment of artificial intelligence as a key industry within broader industrial policy framework of Pakistan, underscoring the importance of aligning it with national goals such as economic resilience and preservation of autonomy. The paper starts with defining industrial policy as a set of targeted government interventions to shape specific sectors for strategic outcomes and argues for its application to AI in Pakistan due to its huge potential, the risks of unregulated adoption, and prevailing market inefficiencies. The paper conceptualizes AI as a layered ecosystem, comprising foundational infrastructure, core computing, development platforms, and service and product layers, supported by education, government policy, and research and development. The analysis highlights that AI sector of Pakistan is predominantly service oriented, with limited product innovation and dependence on foreign technologies, posing risks to economic independence, national security, and employment. To address these challenges, the paper recommends educational reforms, support for local AI product development, initiatives for indigenous cloud and hardware capabilities, and public-private collaborations on foundational models. Additionally, it advocates for public procurement policies and infrastructure incentives to foster local solutions and reduce reliance on foreign providers. This strategy aims to position Pakistan as a competitive, autonomous player in the global AI ecosystem.
- [6] arXiv:2411.02025 [pdf, other]
-
Title: Towards the design of model-based means and methods to characterize and diagnose teachers' digital maturityChristine Michel (Techné (Poitiers)), Laëtitia Pierrot (CREN)Comments: in French languageJournal-ref: STICEF (Sciences et Technologies de l'Information et de la Communication pour l'{\'E}ducation et la Formation), 2024, 31 (1)Subjects: Computers and Society (cs.CY)
This article examines how models of teacher digital maturity can be combined to produce a unified version that can be used to design diagnostic tools and methods. 11 models applicable to the field of compulsory education were identified through a literature review. The models and how their constituent dimensions contribute to the determination of maturity levels were analyzed. The summary highlights the diversity of the dimensions used and the fact that digital maturity is only partially taken into account. What's more, most of these models focus on the most recent maturity levels associated with innovative or pioneering teachers. The models tend to exclude teachers who are not digital users or who have a low level of digital use, but who are present in the French context. In the final part of the article, a proposal for a unified model of teachers' digital maturity, MUME, which addresses these two issues, is described, together with the preliminary results of a study aimed at designing a diagnostic method.
- [7] arXiv:2411.02045 [pdf, other]
-
Title: Conversations with Data: How Data Journalism Affects Online Comments in the New York TimesComments: Hawaii International Conference on System Sciences (HICSS)Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Users in the data age have access to more data than ever before, but little is known how they interact with it. Using transparency and multimedia, data journalism (DJ) lets users explore and interpret data on their own. This study examines how DJ affects online comments as a case study of user interactions with data. The corpus comprises 6,400 stories and their comment sections from the DJ and other sections of the New York Times, from 2014-2022. Results indicate that DJ is positively associated with higher level of interactivity between the users. This relationship is mediated by statistical information, information sources, and static visualizations. However, there is a low level of interactivity with the content; consequently, only part of the users use it. The results demonstrate how data accessibility through DJ engages the users in conversation. According to deliberation theory, this creates a conducive environment for democratic processes.
- [8] arXiv:2411.02307 [pdf, other]
-
Title: Can Personalized Medicine Coexist with Health Equity? Examining the Cost Barrier and Ethical ImplicationsKishi Kobe Yee Francisco, Andrane Estelle Carnicer Apuhin, Myles Joshua Toledo Tan, Mickael Cavanaugh Byers, Nicholle Mae Amor Tan Maravilla, Hezerul Abdul Karim, Nouar AlDahoulComments: 30 pages, 1 figureSubjects: Computers and Society (cs.CY)
Personalized medicine (PM) promises to transform healthcare by providing treatments tailored to individual genetic, environmental, and lifestyle factors. However, its high costs and infrastructure demands raise concerns about exacerbating health disparities, especially between high-income countries (HICs) and low- and middle-income countries (LMICs). While HICs benefit from advanced PM applications through AI and genomics, LMICs often lack the resources necessary to adopt these innovations, leading to a widening healthcare divide. This paper explores the financial and ethical challenges of PM implementation, with a focus on ensuring equitable access. It proposes strategies for global collaboration, infrastructure development, and ethical frameworks to support LMICs in adopting PM, aiming to prevent further disparities in healthcare accessibility and outcomes.
- [9] arXiv:2411.02374 [pdf, other]
-
Title: Identifying Economic Factors Affecting Unemployment Rates in the United StatesAlrick Green, Ayesha Nasim, Jaydeep Radadia, Devi Manaswi Kallam, Viswas Kalyanam, Samfred Owenga, Huthaifa I. AshqarSubjects: Computers and Society (cs.CY); Econometrics (econ.EM)
In this study, we seek to understand how macroeconomic factors such as GDP, inflation, Unemployment Insurance, and S&P 500 index; as well as microeconomic factors such as health, race, and educational attainment impacted the unemployment rate for about 20 years in the United States. Our research question is to identify which factor(s) contributed the most to the unemployment rate surge using linear regression. Results from our studies showed that GDP (negative), inflation (positive), Unemployment Insurance (contrary to popular opinion; negative), and S&P 500 index (negative) were all significant factors, with inflation being the most important one. As for health issue factors, our model produced resultant correlation scores for occurrences of Cardiovascular Disease, Neurological Disease, and Interpersonal Violence with unemployment. Race as a factor showed a huge discrepancies in the unemployment rate between Black Americans compared to their counterparts. Asians had the lowest unemployment rate throughout the years. As for education attainment, results showed that having a higher education attainment significantly reduced one chance of unemployment. People with higher degrees had the lowest unemployment rate. Results of this study will be beneficial for policymakers and researchers in understanding the unemployment rate during the pandemic.
- [10] arXiv:2411.02388 [pdf, other]
-
Title: The Relationship Between Smartphone Usage and Sleep Quality Amongst University StudentsHafsa Chaudhry, Hetvi Patel, Sai Teja Avadhootha, Sushanthik Reddy Poreddy, Swapan Gupta Chollati, Ujwala Namineni, Huthaifa I. AshqarSubjects: Computers and Society (cs.CY); Applications (stat.AP)
Gender differences were examined in sensitivity to sleep quality, in the context of blue light exposure from smartphones. Our hypothesis was created based on our journal article findings that females are more prone to be inclined to the prolonged usage of smartphones at bedtime and thus had less quality of sleep than males. The theory that usage affects sleep quality was due to the belief that the blue light emanating from the smartphone screen would disrupt our body natural circadian rhythm, or sleep cycle, due to the blue light ability to block a hormone called melatonin that controls and aids sleep. However, upon conducting regression tests and statistical analysis on our dataset, we found that our hypothesis was incorrect. Our dataset and analysis showed no relationship between smartphone usage and sleep quality in both males and females in young adults.
New submissions (showing 10 of 10 entries)
- [11] arXiv:2411.00783 (cross-list from cs.HC) [pdf, other]
-
Title: From chalkboards to chatbots: SELAR assists teachers in embracing AI in the curriculumComments: 19 pages, 2 figuresSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
This paper introduces SELAR, a framework designed to effectively help teachers integrate artificial intelligence (AI) into their curriculum. The framework was designed by running workshops organized to gather lecturers' feedback. In this paper, we assess the effectiveness of the framework through additional workshops organized with lecturers from the Hague University of Applied Sciences. The workshops tested the application of the framework to adapt existing courses to leverage generative AI technology. Each participant was tasked to apply SELAR to one of their learning goals in order to evaluate AI integration potential and, if successful, to update the teaching methods accordingly. Findings show that teachers were able to effectively use the SELAR to integrate generative AI into their courses. Future work will focus on providing additional guidance and examples to use the framework more effectively.
- [12] arXiv:2411.00813 (cross-list from cs.MM) [pdf, html, other]
-
Title: Personality Analysis from Online Short Video Platforms with Multi-domain AdaptationSubjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Audio and Speech Processing (eess.AS)
Personality analysis from online short videos has gained prominence due to its applications in personalized recommendation systems, sentiment analysis, and human-computer interaction. Traditional assessment methods, such as questionnaires based on the Big Five Personality Framework, are limited by self-report biases and are impractical for large-scale or real-time analysis. Leveraging the rich, multi-modal data present in short videos offers a promising alternative for more accurate personality inference. However, integrating these diverse and asynchronous modalities poses significant challenges, particularly in aligning time-varying data and ensuring models generalize well to new domains with limited labeled data. In this paper, we propose a novel multi-modal personality analysis framework that addresses these challenges by synchronizing and integrating features from multiple modalities and enhancing model generalization through domain adaptation. We introduce a timestamp-based modality alignment mechanism that synchronizes data based on spoken word timestamps, ensuring accurate correspondence across modalities and facilitating effective feature integration. To capture temporal dependencies and inter-modal interactions, we employ Bidirectional Long Short-Term Memory networks and self-attention mechanisms, allowing the model to focus on the most informative features for personality prediction. Furthermore, we develop a gradient-based domain adaptation method that transfers knowledge from multiple source domains to improve performance in target domains with scarce labeled data. Extensive experiments on real-world datasets demonstrate that our framework significantly outperforms existing methods in personality prediction tasks, highlighting its effectiveness in capturing complex behavioral cues and robustness in adapting to new domains.
- [13] arXiv:2411.00816 (cross-list from cs.CL) [pdf, other]
-
Title: CycleResearcher: Improving Automated Research via Automated ReviewSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
The automation of scientific discovery has been a long-standing goal within the research community, driven by the potential to accelerate knowledge creation. While significant progress has been made using commercial large language models (LLMs) as research assistants or idea generators, the possibility of automating the entire research process with open-source LLMs remains largely unexplored. This paper explores the feasibility of using open-source post-trained LLMs as autonomous agents capable of performing the full cycle of automated research and review, from literature review and manuscript preparation to peer review and paper revision. Our iterative preference training framework consists of CycleResearcher, which conducts research tasks, and CycleReviewer, which simulates the peer review process, providing iterative feedback via reinforcement learning. To train these models, we develop two new datasets, Review-5k and Research-14k, reflecting real-world machine learning research and peer review dynamics. Our results demonstrate that CycleReviewer achieves a 26.89\% improvement in mean absolute error (MAE) over individual human reviewers in predicting paper scores, indicating that LLMs can surpass expert-level performance in research evaluation. In research, the papers generated by the CycleResearcher model achieved a score of 5.36 in simulated peer reviews, surpassing the preprint level of 5.24 from human experts and approaching the accepted paper level of 5.69. This work represents a significant step toward fully automated scientific inquiry, providing ethical safeguards and advancing AI-driven research capabilities. The code, dataset and model weight are released at \url{http://github/minjun-zhu/Researcher}.
- [14] arXiv:2411.00845 (cross-list from cs.LG) [pdf, other]
-
Title: End-to-end Graph Learning Approach for Cognitive Diagnosis of Student TutorialSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cognitive diagnosis (CD) utilizes students' existing studying records to estimate their mastery of unknown knowledge concepts, which is vital for evaluating their learning abilities. Accurate CD is extremely challenging because CD is associated with complex relationships and mechanisms among students, knowledge concepts, studying records, etc. However, existing approaches loosely consider these relationships and mechanisms by a non-end-to-end learning framework, resulting in sub-optimal feature extractions and fusions for CD. Different from them, this paper innovatively proposes an End-to-end Graph Neural Networks-based Cognitive Diagnosis (EGNN-CD) model. EGNN-CD consists of three main parts: knowledge concept network (KCN), graph neural networks-based feature extraction (GNNFE), and cognitive ability prediction (CAP). First, KCN constructs CD-related interaction by comprehensively extracting physical information from students, exercises, and knowledge concepts. Second, a four-channel GNNFE is designed to extract high-order and individual features from the constructed KCN. Finally, CAP employs a multi-layer perceptron to fuse the extracted features to predict students' learning abilities in an end-to-end learning way. With such designs, the feature extractions and fusions are guaranteed to be comprehensive and optimal for CD. Extensive experiments on three real datasets demonstrate that our EGNN-CD achieves significantly higher accuracy than state-of-the-art models in CD.
- [15] arXiv:2411.00864 (cross-list from cs.LG) [pdf, html, other]
-
Title: Advancing Crime Linkage Analysis with Machine Learning: A Comprehensive Review and Framework for Data-Driven ApproachesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Crime linkage is the process of analyzing criminal behavior data to determine whether a pair or group of crime cases are connected or belong to a series of offenses. This domain has been extensively studied by researchers in sociology, psychology, and statistics. More recently, it has drawn interest from computer scientists, especially with advances in artificial intelligence. Despite this, the literature indicates that work in this latter discipline is still in its early stages. This study aims to understand the challenges faced by machine learning approaches in crime linkage and to support foundational knowledge for future data-driven methods. To achieve this goal, we conducted a comprehensive survey of the main literature on the topic and developed a general framework for crime linkage processes, thoroughly describing each step. Our goal was to unify insights from diverse fields into a shared terminology to enhance the research landscape for those intrigued by this subject.
- [16] arXiv:2411.00956 (cross-list from cs.LG) [pdf, html, other]
-
Title: AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public SpacesShreeyash Gowaikar, Hugo Berard, Rashid Mushkani, Emmanuel Beaudry Marchand, Toumadher Ammar, Shin KosekiComments: Presented at CVPR 2024 Workshop on Responsible DataSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Advancements in AI heavily rely on large-scale datasets meticulously curated and annotated for training. However, concerns persist regarding the transparency and context of data collection methodologies, especially when sourced through crowdsourcing platforms. Crowdsourcing often employs low-wage workers with poor working conditions and lacks consideration for the representativeness of annotators, leading to algorithms that fail to represent diverse views and perpetuate biases against certain groups. To address these limitations, we propose a methodology involving a co-design model that actively engages stakeholders at key stages, integrating principles of Equity, Diversity, and Inclusion (EDI) to ensure diverse viewpoints. We apply this methodology to develop a dataset and AI model for evaluating public space quality using street view images, demonstrating its effectiveness in capturing diverse perspectives and fostering higher-quality data.
- [17] arXiv:2411.00997 (cross-list from cs.CV) [pdf, html, other]
-
Title: Identifying Implicit Social Biases in Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Vision-language models, like CLIP (Contrastive Language Image Pretraining), are becoming increasingly popular for a wide range of multimodal retrieval tasks. However, prior work has shown that large language and deep vision models can learn historical biases contained in their training sets, leading to perpetuation of stereotypes and potential downstream harm. In this work, we conduct a systematic analysis of the social biases that are present in CLIP, with a focus on the interaction between image and text modalities. We first propose a taxonomy of social biases called So-B-IT, which contains 374 words categorized across ten types of bias. Each type can lead to societal harm if associated with a particular demographic group. Using this taxonomy, we examine images retrieved by CLIP from a facial image dataset using each word as part of a prompt. We find that CLIP frequently displays undesirable associations between harmful words and specific demographic groups, such as retrieving mostly pictures of Middle Eastern men when asked to retrieve images of a "terrorist". Finally, we conduct an analysis of the source of such biases, by showing that the same harmful stereotypes are also present in a large image-text dataset used to train CLIP models for examples of biases that we find. Our findings highlight the importance of evaluating and addressing bias in vision-language models, and suggest the need for transparency and fairness-aware curation of large pre-training datasets.
- [18] arXiv:2411.01042 (cross-list from cs.LG) [pdf, other]
-
Title: Introduction to AI Safety, Ethics, and SocietyComments: 603 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Artificial Intelligence is rapidly embedding itself within militaries, economies, and societies, reshaping their very foundations. Given the depth and breadth of its consequences, it has never been more pressing to understand how to ensure that AI systems are safe, ethical, and have a positive societal impact. This book aims to provide a comprehensive approach to understanding AI risk. Our primary goals include consolidating fragmented knowledge on AI risk, increasing the precision of core ideas, and reducing barriers to entry by making content simpler and more comprehensible. The book has been designed to be accessible to readers from diverse backgrounds. You do not need to have studied AI, philosophy, or other such topics. The content is skimmable and somewhat modular, so that you can choose which chapters to read. We introduce mathematical formulas in a few places to specify claims more precisely, but readers should be able to understand the main points without these.
- [19] arXiv:2411.01134 (cross-list from cs.LG) [pdf, html, other]
-
Title: An Event-centric Framework for Predicting Crime Hotspots with Flexible Time IntervalsComments: 21 pages, 12 figuresSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Predicting crime hotspots in a city is a complex and critical task with significant societal implications. Numerous spatiotemporal correlations and irregularities pose substantial challenges to this endeavor. Existing methods commonly employ fixed-time granularities and sequence prediction models. However, determining appropriate time granularities is difficult, leading to inaccurate predictions for specific time windows. For example, users might ask: What are the crime hotspots during 12:00-20:00? To address this issue, we introduce FlexiCrime, a novel event-centric framework for predicting crime hotspots with flexible time intervals. FlexiCrime incorporates a continuous-time attention network to capture correlations between crime events, which learns crime context features, representing general crime patterns across time points and locations. Furthermore, we introduce a type-aware spatiotemporal point process that learns crime-evolving features, measuring the risk of specific crime types at a given time and location by considering the frequency of past crime events. The crime context and evolving features together allow us to predict whether an urban area is a crime hotspot given a future time interval. To evaluate FlexiCrime's effectiveness, we conducted experiments using real-world datasets from two cities, covering twelve crime types. The results show that our model outperforms baseline techniques in predicting crime hotspots over flexible time intervals.
- [20] arXiv:2411.01259 (cross-list from cs.CL) [pdf, html, other]
-
Title: Diversidade lingu\'istica e inclus\~ao digital: desafios para uma ia brasileiraComments: in Portuguese language. paper aceepted to LAAI-Ethics 2024Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat. This paper, based on the contributions of sociolinguistics, examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized because it has linguistic documentation to feed the large language models for machine learning.
- [21] arXiv:2411.01369 (cross-list from cs.CL) [pdf, other]
-
Title: Artificial Intelligence Driven Course Generation: A Case Study Using ChatGPTComments: 16 pagesJournal-ref: ATRAS, Vol. 5 No. 3 (2024): Artificial Intelligence and Education and Online LearningSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
This study explores Artificial Intelligence use, specifically ChatGPT, in creating educational content. The study aims to elaborate on using ChatGPT to create course materials. The main objective is to assess the efficiency, quality, and impact of AI-driven course generation, and to create a Multimedia Databases course as a case study. The study highlights the potential of AI to revolutionize educational content creation, making it more accessible, personalized, and efficient. The course content was generated in less than one day through iterative methods, using prompts for translation, content expansion, practical examples, assignments, supplementary materials, and LaTeX formatting. Each part was verified immediately after generation to ensure accuracy. Post-generation analysis with Detectia and Turnitin showed similarity rates of 8.7% and 13%, indicating high originality. Experts and university committees reviewed and approved the course, with English university teachers praising its language quality. ChatGPT also created a well-structured and diversified exam for the module. Key findings reveal significant time efficiency, comprehensive content coverage, and high flexibility. The study underscores AI's transformative potential in education, addressing challenges related to data privacy, technology dependence, content accuracy, and algorithmic biases. The conclusions emphasize the need for collaboration between educators, policymakers, and technology developers to harness AI's benefits in education fully.
- [22] arXiv:2411.01426 (cross-list from cs.HC) [pdf, html, other]
-
Title: AURA: Amplifying Understanding, Resilience, and Awareness for Responsible AI Content WorkComments: To be presented at CSCW 2025Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Behind the scenes of maintaining the safety of technology products from harmful and illegal digital content lies unrecognized human labor. The recent rise in the use of generative AI technologies and the accelerating demands to meet responsible AI (RAI) aims necessitates an increased focus on the labor behind such efforts in the age of AI. This study investigates the nature and challenges of content work that supports RAI efforts, or "RAI content work," that span content moderation, data labeling, and red teaming -- through the lived experiences of content workers. We conduct a formative survey and semi-structured interview studies to develop a conceptualization of RAI content work and a subsequent framework of recommendations for providing holistic support for content workers. We validate our recommendations through a series of workshops with content workers and derive considerations for and examples of implementing such recommendations. We discuss how our framework may guide future innovation to support the well-being and professional development of the RAI content workforce.
- [23] arXiv:2411.01685 (cross-list from cs.LG) [pdf, html, other]
-
Title: Mitigating Matching Biases Through Score CalibrationSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Databases (cs.DB)
Record matching, the task of identifying records that correspond to the same real-world entities across databases, is critical for data integration in domains like healthcare, finance, and e-commerce. While traditional record matching models focus on optimizing accuracy, fairness issues, such as demographic disparities in model performance, have attracted increasing attention. Biased outcomes in record matching can result in unequal error rates across demographic groups, raising ethical and legal concerns. Existing research primarily addresses fairness at specific decision thresholds, using bias metrics like Demographic Parity (DP), Equal Opportunity (EO), and Equalized Odds (EOD) differences. However, threshold-specific metrics may overlook cumulative biases across varying thresholds. In this paper, we adapt fairness metrics traditionally applied in regression models to evaluate cumulative bias across all thresholds in record matching. We propose a novel post-processing calibration method, leveraging optimal transport theory and Wasserstein barycenters, to balance matching scores across demographic groups. This approach treats any matching model as a black box, making it applicable to a wide range of models without access to their training data. Our experiments demonstrate the effectiveness of the calibration method in reducing demographic parity difference in matching scores. To address limitations in reducing EOD and EO differences, we introduce a conditional calibration method, which empirically achieves fairness across widely used benchmarks and state-of-the-art matching methods. This work provides a comprehensive framework for fairness-aware record matching, setting the foundation for more equitable data integration processes.
- [24] arXiv:2411.01940 (cross-list from cs.SE) [pdf, other]
-
Title: Systematic Mapping Study on Requirements Engineering for Regulatory Compliance of Software SystemsOleksandr Kosenkov, Parisa Elahidoost, Tony Gorschek, Jannik Fischbach, Daniel Mendez, Michael Unterkalmsteiner, Davide Fucci, Rahul MohananiComments: Accepted to "Information and Software Technology" JournalSubjects: Software Engineering (cs.SE); Computers and Society (cs.CY)
Context: As the diversity and complexity of regulations affecting Software-Intensive Products and Services (SIPS) is increasing, software engineers need to address the growing regulatory scrutiny. As with any other non-negotiable requirements, SIPS compliance should be addressed early in SIPS engineering - i.e., during requirements engineering (RE). Objectives: In the conditions of the expanding regulatory landscape, existing research offers scattered insights into regulatory compliance of SIPS. This study addresses the pressing need for a structured overview of the state of the art in software RE and its contribution to regulatory compliance of SIPS. Method: We conducted a systematic mapping study to provide an overview of the current state of research regarding challenges, principles and practices for regulatory compliance of SIPS related to RE. We focused on the role of RE and its contribution to other SIPS lifecycle phases. We retrieved 6914 studies published from 2017 until 2023 from four academic databases, which we filtered down to 280 relevant primary studies. Results: We identified and categorized the RE-related challenges in regulatory compliance of SIPS and their potential connection to six types of principles and practices. We found that about 13.6% of the primary studies considered the involvement of both software engineers and legal experts. About 20.7% of primary studies considered RE in connection to other process areas. Most primary studies focused on a few popular regulation fields and application domains. Our results suggest that there can be differences in terms of challenges and involvement of stakeholders across different fields of regulation. Conclusion: Our findings highlight the need for an in-depth investigation of stakeholders' roles, relationships between process areas, and specific challenges for distinct regulatory fields to guide research and practice.
- [25] arXiv:2411.01956 (cross-list from cs.LG) [pdf, html, other]
-
Title: EXAGREE: Towards Explanation Agreement in Explainable Machine LearningSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
Explanations in machine learning are critical for trust, transparency, and fairness. Yet, complex disagreements among these explanations limit the reliability and applicability of machine learning models, especially in high-stakes environments. We formalize four fundamental ranking-based explanation disagreement problems and introduce a novel framework, EXplanation AGREEment (EXAGREE), to bridge diverse interpretations in explainable machine learning, particularly from stakeholder-centered perspectives. Our approach leverages a Rashomon set for attribution predictions and then optimizes within this set to identify Stakeholder-Aligned Explanation Models (SAEMs) that minimize disagreement with diverse stakeholder needs while maintaining predictive performance. Rigorous empirical analysis on synthetic and real-world datasets demonstrates that EXAGREE reduces explanation disagreement and improves fairness across subgroups in various domains. EXAGREE not only provides researchers with a new direction for studying explanation disagreement problems but also offers data scientists a tool for making better-informed decisions in practical applications.
- [26] arXiv:2411.02317 (cross-list from cs.LG) [pdf, html, other]
-
Title: Defining and Evaluating Physical Safety for Large Language ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Large Language Models (LLMs) are increasingly used to control robotic systems such as drones, but their risks of causing physical threats and harm in real-world applications remain unexplored. Our study addresses the critical gap in evaluating LLM physical safety by developing a comprehensive benchmark for drone control. We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations. Our evaluation of mainstream LLMs reveals an undesirable trade-off between utility and safety, with models that excel in code generation often performing poorly in crucial safety aspects. Furthermore, while incorporating advanced prompt engineering techniques such as In-Context Learning and Chain-of-Thought can improve safety, these methods still struggle to identify unintentional attacks. In addition, larger models demonstrate better safety capabilities, particularly in refusing dangerous commands. Our findings and benchmark can facilitate the design and evaluation of physical safety for LLMs. The project page is available at this http URL.
Cross submissions (showing 16 of 16 entries)
- [27] arXiv:2406.06934 (replaced) [pdf, other]
-
Title: Decentralized Social Networks and the Future of Free Speech OnlineJournal-ref: Computer Law & Security Review 55, 106059 (2024)Subjects: Computers and Society (cs.CY); Emerging Technologies (cs.ET); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
Decentralized social networks like Mastodon and BlueSky are trending topics that have drawn much attention and discussion in recent years. By devolving powers from the central node to the end users, decentralized social networks aim to cure existing pathologies on the centralized platforms and have been viewed by many as the future of the Internet. This article critically and systematically assesses the decentralization project's prospect for communications online. It uses normative theories of free speech to examine whether and how the decentralization design could facilitate users' freedom of expression online. The analysis shows that both promises and pitfalls exist, highlighting the importance of value-based design in this area. Two most salient issues for the design of the decentralized networks are: how to balance the decentralization ideal with constant needs of centralization on the network, and how to empower users to make them truly capable of exercising their control. The article then uses some design examples, such as the shared blocklist and the opt-in search function, to illustrate the value considerations underlying the design choices. Some tentative proposals for law and policy interventions are offered to better facilitate the design of the new network. Rather than providing clear answers, the article seeks to map the value implications of the design choices, highlight the stakes, and point directions for future research.
- [28] arXiv:2410.18114 (replaced) [pdf, html, other]
-
Title: Bridging Today and the Future of Humanity: AI Safety in 2024 and BeyondSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
The growing prevalence of generative AI inevitably raises concerns regarding the associated risks and safety implications, which catalyzes significant progress in AI safety. However, as this field thrives, a critical question emerges: Are our current efforts aligned with the broader perspective of human history and civilization? This paper presents a blueprint for an advanced human society and leverages this vision to guide contemporary AI safety efforts. It outlines a future where the Internet of Everything becomes reality, and create a roadmap of significant technological advancements towards this envisioned future. For each stage of the advancements, this paper forecasts potential AI safety issues that humanity may face. By projecting current efforts against this blueprint, we examine the alignment between the present efforts and the long-term needs. This paper identifies gaps in current approaches and highlights unique challenges and missions that demand increasing attention from AI safety practitioners in the 2020s, addressing critical areas that must not be overlooked in shaping a responsible future for AI development. This vision paper aims to offer a broader perspective on AI safety, emphasizing that our current efforts should not only address immediate concerns but also anticipate potential risks in the expanding AI landscape, thereby fostering AI's role in promoting a more secure and sustainable future for human civilization.
- [29] arXiv:2410.18357 (replaced) [pdf, other]
-
Title: The Impact of Generative Artificial Intelligence on Ideation and the performance of Innovation Teams (Preprint)Comments: 24 pages, 5 figures, Author Contributions: Michael Gindert: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Project administration, Funding acquisition Marvin Lutz Müller: Validation, Investigation, Resources, Writing - Review & Editing, SupervisionSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
This study investigates the impact of Generative Artificial Intelligence (GenAI) on the dynam-ics and performance of innovation teams during the idea generation phase of the innovation process. Utilizing a custom AI-augmented ideation tool, the study applies the Knowledge Spill-over Theory of Entrepreneurship to understand the effects of AI on knowledge spillover, gen-eration and application. Through a framed field experiment with participants divided into exper-imental and control groups, findings indicate that AI-augmented teams generated higher quali-ty ideas in less time. GenAI application led to improved efficiency, knowledge exchange, in-creased satisfaction and engagement as well as enhanced idea diversity. These results high-light the transformative role of the field of AI within the innovation management domain and shows that GenAI has a positive impact on important elements of the Knowledge Spillover Theory of Entrepeneurship, emphasizing its potential impact on innovation, entrepreneurship, and economic growth. Future research should further explore the dynamic interaction be-tween GenAI and creative processes.
- [30] arXiv:2410.18991 (replaced) [pdf, html, other]
-
Title: TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty SimulationsSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
We present the TRIAGE Benchmark, a novel machine ethics (ME) benchmark that tests LLMs' ability to make ethical decisions during mass casualty incidents. It uses real-world ethical dilemmas with clear solutions designed by medical professionals, offering a more realistic alternative to annotation-based benchmarks. TRIAGE incorporates various prompting styles to evaluate model performance across different contexts. Most models consistently outperformed random guessing, suggesting LLMs may support decision-making in triage scenarios. Neutral or factual scenario formulations led to the best performance, unlike other ME benchmarks where ethical reminders improved outcomes. Adversarial prompts reduced performance but not to random guessing levels. Open-source models made more morally serious errors, and general capability overall predicted better performance.
- [31] arXiv:2410.22282 (replaced) [pdf, html, other]
-
Title: Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language ModelsSubjects: Computers and Society (cs.CY)
The universal availability of ChatGPT and other similar tools since late 2022 has prompted tremendous public excitement and experimental effort about the potential of large language models (LLMs) to improve learning experience and outcomes, especially for learners from disadvantaged backgrounds. However, little research has systematically examined the real-world impacts of LLM availability on educational equity beyond theoretical projections and controlled studies of innovative LLM applications. To depict trends of post-LLM inequalities, we analyze 1,140,328 academic writing submissions from 16,791 college students across 2,391 courses between 2021 and 2024 at a public, minority-serving institution in the US. We find that students' overall writing quality gradually increased following the availability of LLMs and that the writing quality gaps between linguistically advantaged and disadvantaged students became increasingly narrower. However, this equitizing effect was more concentrated on students with higher socioeconomic status. These findings shed light on the digital divides in the era of LLMs and raise questions about the equity benefits of LLMs in early stages and highlight the need for researchers and practitioners on developing responsible practices to improve educational equity through LLMs.
- [32] arXiv:2109.05662 (replaced) [pdf, html, other]
-
Title: Training Fair Models in Federated Learning without Data Privacy InfringementComments: Accepted by IEEE International Conference on Big Data (2024)Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Training fair machine learning models becomes more and more important. As many powerful models are trained by collaboration among multiple parties, each holding some sensitive data, it is natural to explore the feasibility of training fair models in federated learning so that the fairness of trained models, the data privacy of clients, and the collaboration between clients can be fully respected simultaneously. However, the task of training fair models in federated learning is challenging, since it is far from trivial to estimate the fairness of a model without knowing the private data of the participating parties, which is often constrained by privacy requirements in federated learning. In this paper, we first propose a federated estimation method to accurately estimate the fairness of a model without infringing the data privacy of any party. Then, we use the fairness estimation to formulate a novel problem of training fair models in federated learning. We develop FedFair, a well-designed federated learning framework, which can successfully train a fair model with high performance without data privacy infringement. Our extensive experiments on three real-world data sets demonstrate the excellent fair model training performance of our method.
- [33] arXiv:2312.03749 (replaced) [pdf, html, other]
-
Title: Conceptual Engineering Using Large Language ModelsComments: 22 pages, 2 figures, to appear in Vincent C. Müller, Aliya R. Dewey, Leonard Dung & Guido Löhr (eds.), Philosophy of Artificial Intelligence: The State of the Art. Berlin: SpringerNature (forthcoming), for associated code and data see this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
We describe a method, based on Jennifer Nado's proposal for classification procedures as targets of conceptual engineering, that implements such procedures by prompting a large language model. We apply this method, using data from the Wikidata knowledge graph, to evaluate stipulative definitions related to two paradigmatic conceptual engineering projects: the International Astronomical Union's redefinition of PLANET and Haslanger's ameliorative analysis of WOMAN. Our results show that classification procedures built using our approach can exhibit good classification performance and, through the generation of rationales for their classifications, can contribute to the identification of issues in either the definitions or the data against which they are being evaluated. We consider objections to this method, and discuss implications of this work for three aspects of theory and practice of conceptual engineering: the definition of its targets, empirical methods for their investigation, and their practical roles. The data and code used for our experiments, together with the experimental results, are available in a Github repository.
- [34] arXiv:2405.12312 (replaced) [pdf, html, other]
-
Title: A Principled Approach for a New Bias MeasureSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. The areas in which this is happening are diverse: healthcare, employment, finance, education, the legal system to name a few; and the associated negative side effects are being increasingly harmful for society. Negative data \emph{bias} is one of those, which tends to result in harmful consequences for specific groups of people. Any mitigation strategy or effective policy that addresses the negative consequences of bias must start with awareness that bias exists, together with a way to understand and quantify it. However, there is a lack of consensus on how to measure data bias and oftentimes the intended meaning is context dependent and not uniform within the research community. The main contributions of our work are: (1) The definition of Uniform Bias (UB), the first bias measure with a clear and simple interpretation in the full range of bias values. (2) A systematic study to characterize the flaws of existing measures in the context of anti employment discrimination rules used by the Office of Federal Contract Compliance Programs, additionally showing how UB solves open problems in this domain. (3) A framework that provides an efficient way to derive a mathematical formula for a bias measure based on an algorithmic specification of bias addition. Our results are experimentally validated using nine publicly available datasets and theoretically analyzed, which provide novel insights about the problem. Based on our approach, we also design a bias mitigation model that might be useful to policymakers.
- [35] arXiv:2406.00799 (replaced) [pdf, html, other]
-
Title: Are you still on track!? Catching LLM Task Drift with ActivationsSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Computers and Society (cs.CY)
Large Language Models are commonly used in retrieval-augmented applications to execute user instructions based on data from external sources. For example, modern search engines use LLMs to answer queries based on relevant search results; email plugins summarize emails by processing their content through an LLM. However, the potentially untrusted provenance of these data sources can lead to prompt injection attacks, where the LLM is manipulated by natural language instructions embedded in the external data, causing it to deviate from the user's original instruction(s). We define this deviation as task drift. Task drift is a significant concern as it allows attackers to exfiltrate data or influence the LLM's output for other users. We study LLM activations as a solution to detect task drift, showing that activation deltas - the difference in activations before and after processing external data - are strongly correlated with this phenomenon. Through two probing methods, we demonstrate that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set. We evaluate these methods by making minimal assumptions about how user's tasks, system prompts, and attacks can be phrased. We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions, without being trained on any of these attacks. Interestingly, the fact that this solution does not require any modifications to the LLM (e.g., fine-tuning), as well as its compatibility with existing meta-prompting solutions, makes it cost-efficient and easy to deploy. To encourage further research on activation-based task inspection, decoding, and interpretability, we release our large-scale TaskTracker toolkit, featuring a dataset of over 500K instances, representations from six SoTA language models, and inspection tools.
- [36] arXiv:2406.06007 (replaced) [pdf, html, other]
-
Title: CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language ModelsPeng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu YaoComments: NeurIPS 2024 Datasets and Benchmarks TrackSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in this https URL.
- [37] arXiv:2407.03059 (replaced) [pdf, html, other]
-
Title: FairJob: A Real-World Dataset for Fairness in Online SystemsComments: NeurIPS 2024, 28 pages, 15 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)
We introduce a fairness-aware dataset for job recommendations in advertising, designed to foster research in algorithmic fairness within real-world scenarios. It was collected and prepared to comply with privacy standards and business confidentiality. An additional challenge is the lack of access to protected user attributes such as gender, for which we propose a solution to obtain a proxy estimate. Despite being anonymized and including a proxy for a sensitive attribute, our dataset preserves predictive power and maintains a realistic and challenging benchmark. This dataset addresses a significant gap in the availability of fairness-focused resources for high-impact domains like advertising -- the actual impact being having access or not to precious employment opportunities, where balancing fairness and utility is a common industrial challenge. We also explore various stages in the advertising process where unfairness can occur and introduce a method to compute a fair utility metric for the job recommendations in online systems case from a biased dataset. Experimental evaluations of bias mitigation techniques on the released dataset demonstrate potential improvements in fairness and the associated trade-offs with utility.
The dataset is hosted at this https URL. Source code for the experiments is hosted at this https URL. - [38] arXiv:2410.17225 (replaced) [pdf, html, other]
-
Title: Dhoroni: Exploring Bengali Climate Change and Environmental Views with a Multi-Perspective News Dataset and Natural Language ProcessingComments: In ReviewSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
Climate change poses critical challenges globally, disproportionately affecting low-income countries that often lack resources and linguistic representation on the international stage. Despite Bangladesh's status as one of the most vulnerable nations to climate impacts, research gaps persist in Bengali-language studies related to climate change and NLP. To address this disparity, we introduce Dhoroni, a novel Bengali (Bangla) climate change and environmental news dataset, comprising a 2300 annotated Bangla news articles, offering multiple perspectives such as political influence, scientific/statistical data, authenticity, stance detection, and stakeholder involvement. Furthermore, we present an in-depth exploratory analysis of Dhoroni and introduce BanglaBERT-Dhoroni family, a novel baseline model family for climate and environmental opinion detection in Bangla, fine-tuned on our dataset. This research contributes significantly to enhancing accessibility and analysis of climate discourse in Bengali (Bangla), addressing crucial communication and research gaps in climate-impacted regions like Bangladesh with 180 million people.
- [39] arXiv:2410.20746 (replaced) [pdf, html, other]
-
Title: ElectionSim: Massive Population Election Simulation Powered by Large Language Model Driven AgentsXinnong Zhang, Jiayu Lin, Libo Sun, Weihong Qi, Yihang Yang, Yue Chen, Hanjia Lyu, Xinyi Mou, Siming Chen, Jiebo Luo, Xuanjing Huang, Shiping Tang, Zhongyu WeiComments: 42 pages, 14 figuresSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
The massive population election simulation aims to model the preferences of specific groups in particular election scenarios. It has garnered significant attention for its potential to forecast real-world social trends. Traditional agent-based modeling (ABM) methods are constrained by their ability to incorporate complex individual background information and provide interactive prediction results. In this paper, we introduce ElectionSim, an innovative election simulation framework based on large language models, designed to support accurate voter simulations and customized distributions, together with an interactive platform to dialogue with simulated voters. We present a million-level voter pool sampled from social media platforms to support accurate individual simulation. We also introduce PPE, a poll-based presidential election benchmark to assess the performance of our framework under the U.S. presidential election scenario. Through extensive experiments and analyses, we demonstrate the effectiveness and robustness of our framework in U.S. presidential election simulations.