Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

Han, Xiao; Wang, Leye; Wu, Junjie; Fang, Xiao

Computer Science > Machine Learning

arXiv:2112.08364 (cs)

[Submitted on 15 Dec 2021 (v1), last revised 4 Jan 2024 (this version, v3)]

Title:Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

Authors:Xiao Han, Leye Wang, Junjie Wu, Xiao Fang

View PDF

Abstract:Vertical Federated learning (VFL) is a promising paradigm for predictive analytics, empowering an organization (i.e., task party) to enhance its predictive models through collaborations with multiple data suppliers (i.e., data parties) in a decentralized and privacy-preserving way. Despite the fast-growing interest in VFL, the lack of effective and secure tools for assessing the value of data owned by data parties hinders the application of VFL in business contexts. In response, we propose FedValue, a privacy-preserving, task-specific but model-free data valuation method for VFL, which consists of a data valuation metric and a federated computation method. Specifically, we first introduce a novel data valuation metric, namely MShapley-CMI. The metric evaluates a data party's contribution to a predictive analytics task without the need of executing a machine learning model, making it well-suited for real-world applications of VFL. Next, we develop an innovative federated computation method that calculates the MShapley-CMI value for each data party in a privacy-preserving manner. Extensive experiments conducted on six public datasets validate the efficacy of FedValue for data valuation in the context of VFL. In addition, we illustrate the practical utility of FedValue with a case study involving federated movie recommendations.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2112.08364 [cs.LG]
	(or arXiv:2112.08364v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2112.08364

Submission history

From: Xiao Han [view email]
[v1] Wed, 15 Dec 2021 02:42:28 UTC (1,229 KB)
[v2] Mon, 1 Jan 2024 03:11:53 UTC (2,547 KB)
[v3] Thu, 4 Jan 2024 07:19:17 UTC (2,547 KB)

Computer Science > Machine Learning

Title:Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Data Valuation for Vertical Federated Learning: A Model-free and Privacy-preserving Method

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators