Mathematics > Optimization and Control
[Submitted on 3 Jul 2022 (v1), last revised 7 Jun 2023 (this version, v2)]
Title:Myopic Quantal Response Policy: Thompson Sampling Meets Behavioral Economics
View PDFAbstract:We study a novel family of behavioral policies for the multi-armed bandit (MAB) problem, which we have termed Myopic Quantal Response (MQR). MQR prescribes a simple way to randomize over arms according to historical rewards and a "coefficient of exploitation," which explicitly manages the exploration-exploitation trade-off. MQR is a dynamic adaptation of quantal response models where the anticipated utilities are directly derived from past rewards. Furthermore, it can be viewed as a generalization of the Thompson Sampling (TS) algorithm. We develop an asymptotic theory for MQR and show how it can help understand not only asymptotically optimal policies like TS, but also those that are suboptimal due to "under" or "over" exploring. In the non-asymptotic setup, we demonstrate how MQR can be used as a structural estimation tool: Given observed data (i.e., realized actions and rewards), we can estimate the implied coefficient of exploitation of any given policy (either generated by human beings or algorithms). This allows us to diagnose whether and to what extent the policy underexplores or overexplores.
Submission history
From: Jingying Ding [view email][v1] Sun, 3 Jul 2022 12:57:13 UTC (319 KB)
[v2] Wed, 7 Jun 2023 12:27:16 UTC (186 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.