Influence-based Attributions can be Manipulated

Yadav, Chhavi; Wu, Ruihan; Chaudhuri, Kamalika

Computer Science > Machine Learning

arXiv:2409.05208 (cs)

[Submitted on 8 Sep 2024 (v1), last revised 7 Oct 2024 (this version, v4)]

Title:Influence-based Attributions can be Manipulated

Authors:Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

View PDF HTML (experimental)

Abstract:Influence Functions are a standard tool for attributing predictions to training data in a principled manner and are widely used in applications such as data valuation and fairness. In this work, we present realistic incentives to manipulate influence-based attributions and investigate whether these attributions can be \textit{systematically} tampered by an adversary. We show that this is indeed possible for logistic regression models trained on ResNet feature embeddings and standard tabular fairness datasets and provide efficient attacks with backward-friendly implementations. Our work raises questions on the reliability of influence-based attributions in adversarial circumstances. Code is available at : \url{this https URL}

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2409.05208 [cs.LG]
	(or arXiv:2409.05208v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.05208

Submission history

From: Chhavi Yadav [view email]
[v1] Sun, 8 Sep 2024 19:52:00 UTC (1,764 KB)
[v2] Tue, 10 Sep 2024 02:58:54 UTC (1,764 KB)
[v3] Fri, 4 Oct 2024 01:54:23 UTC (1,834 KB)
[v4] Mon, 7 Oct 2024 03:13:37 UTC (1,834 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-09

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Influence-based Attributions can be Manipulated

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Influence-based Attributions can be Manipulated

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators