Quantifying the Burden of Exploration and the Unfairness of Free Riding

Jung, Christopher; Kannan, Sampath; Lutz, Neil

Computer Science > Machine Learning

arXiv:1810.08743 (cs)

[Submitted on 20 Oct 2018 (v1), last revised 4 Feb 2022 (this version, v5)]

Title:Quantifying the Burden of Exploration and the Unfairness of Free Riding

Authors:Christopher Jung, Sampath Kannan, Neil Lutz

View PDF

Abstract:We consider the multi-armed bandit setting with a twist. Rather than having just one decision maker deciding which arm to pull in each round, we have $n$ different decision makers (agents). In the simple stochastic setting, we show that a "free-riding" agent observing another "self-reliant" agent can achieve just $O(1)$ regret, as opposed to the regret lower bound of $\Omega (\log t)$ when one decision maker is playing in isolation. This result holds whenever the self-reliant agent's strategy satisfies either one of two assumptions: (1) each arm is pulled at least $\gamma \ln t$ times in expectation for a constant $\gamma$ that we compute, or (2) the self-reliant agent achieves $o(t)$ realized regret with high probability. Both of these assumptions are satisfied by standard zero-regret algorithms. Under the second assumption, we further show that the free rider only needs to observe the number of times each arm is pulled by the self-reliant agent, and not the rewards realized.
In the linear contextual setting, each arm has a distribution over parameter vectors, each agent has a context vector, and the reward realized when an agent pulls an arm is the inner product of that agent's context vector with a parameter vector sampled from the pulled arm's distribution. We show that the free rider can achieve $O(1)$ regret in this setting whenever the free rider's context is a small (in $L_2$-norm) linear combination of other agents' contexts and all other agents pull each arm $\Omega (\log t)$ times with high probability. Again, this condition on the self-reliant players is satisfied by standard zero-regret algorithms like UCB. We also prove a number of lower bounds.

Subjects:	Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Cite as:	arXiv:1810.08743 [cs.LG]
	(or arXiv:1810.08743v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1810.08743

Submission history

From: Christopher Jung [view email]
[v1] Sat, 20 Oct 2018 03:08:52 UTC (15 KB)
[v2] Wed, 23 Jan 2019 22:22:35 UTC (136 KB)
[v3] Wed, 17 Jul 2019 01:16:56 UTC (98 KB)
[v4] Tue, 22 Sep 2020 17:29:49 UTC (306 KB)
[v5] Fri, 4 Feb 2022 15:57:48 UTC (306 KB)

Computer Science > Machine Learning

Title:Quantifying the Burden of Exploration and the Unfairness of Free Riding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Quantifying the Burden of Exploration and the Unfairness of Free Riding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators