Aggregating Data for Optimal and Private Learning

Agarwal, Sushant; Makhija, Yukti; Saket, Rishi; Raghuveer, Aravindan

Computer Science > Machine Learning

arXiv:2411.19045 (cs)

[Submitted on 28 Nov 2024]

Title:Aggregating Data for Optimal and Private Learning

Authors:Sushant Agarwal, Yukti Makhija, Rishi Saket, Aravindan Raghuveer

View PDF HTML (experimental)

Abstract:Multiple Instance Regression (MIR) and Learning from Label Proportions (LLP) are learning frameworks arising in many applications, where the training data is partitioned into disjoint sets or bags, and only an aggregate label i.e., bag-label for each bag is available to the learner. In the case of MIR, the bag-label is the label of an undisclosed instance from the bag, while in LLP, the bag-label is the mean of the bag's labels. In this paper, we study for various loss functions in MIR and LLP, what is the optimal way to partition the dataset into bags such that the utility for downstream tasks like linear regression is maximized. We theoretically provide utility guarantees, and show that in each case, the optimal bagging strategy (approximately) reduces to finding an optimal clustering of the feature vectors or the labels with respect to natural objectives such as $k$-means. We also show that our bagging mechanisms can be made label-differentially private, incurring an additional utility error. We then generalize our results to the setting of Generalized Linear Models (GLMs). Finally, we experimentally validate our theoretical results.

Comments:	36 pages
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2411.19045 [cs.LG]
	(or arXiv:2411.19045v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.19045

Submission history

From: Yukti Makhija [view email]
[v1] Thu, 28 Nov 2024 10:44:00 UTC (34 KB)

Computer Science > Machine Learning

Title:Aggregating Data for Optimal and Private Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Aggregating Data for Optimal and Private Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators