Flynn, Hamish (2023)
PAC-Bayesian Bandit Algorithms With Guarantees.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00024778
Ph.D. Thesis, Primary publication, Publisher's Version
Text
thesis-flynn.pdf Copyright Information: CC BY-SA 4.0 International - Creative Commons, Attribution ShareAlike. Download (2MB) |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | PAC-Bayesian Bandit Algorithms With Guarantees | ||||
Language: | English | ||||
Referees: | Peters, Prof. Jan ; Kandemir, Prof. Melih ; Guedj, Prof. Benjamin | ||||
Date: | 20 November 2023 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | ix, 167 Seiten | ||||
Date of oral examination: | 18 October 2023 | ||||
DOI: | 10.26083/tuprints-00024778 | ||||
Abstract: | PAC-Bayes is a mathematical framework that can be used to provide performance guarantees for machine learning algorithms, explain why specific machine learning algorithms work well, and design new machine learning algorithms. Since the first PAC-Bayesian theorems were proven in the late 1990's, several impressive milestones have been achieved. PAC-Bayes generalisation bounds have been used to prove tight error bounds for deep neural networks. In addition, PAC-Bayes bounds have been used to explain why machine learning principles such as large margin classification and preference for flat minima of a loss function work well. However, these milestones were achieved in simple supervised learning problems. In this thesis, inspired by the success of the PAC-Bayes framework in supervised learning settings, we investigate the potential of the PAC-Bayes framework as a tool for designing and analysing bandit algorithms. First, we provide a comprehensive overview of PAC-Bayes bounds for bandit problems and an experimental comparison of these bounds. Previous works focused on PAC-Bayes bounds for martingales and their application to importance sampling-based estimates of the reward or regret of a policy. On the one hand, we found that these PAC-Bayes bounds are a useful tool for designing offline policy search algorithms with performance guarantees. In our experiments, a PAC-Bayesian offline policy search algorithm was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees. On the other hand, the PAC-Bayesian online policy search algorithms that we tested had underwhelming performance and loose cumulative regret bounds. Next, we present novel PAC-Bayes-style algorithms with worst-case regret bounds for linear bandit problems. We combine PAC-Bayes bounds with the "optimism in the face of uncertainty" principle, which reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. We use a novel PAC-Bayes-style tail bound for adaptive martingale mixtures to construct convex PAC-Bayes-style confidence sequences for (sparse) linear bandits. We show that (sparse) linear bandit algorithms based on our PAC-Bayes-style confidence sequences are guaranteed to achieve competitive worst-case regret. We also show that our confidence sequences yield confidence bounds that are tighter than competitors, both empirically and theoretically. Finally, we demonstrate that our tighter PAC-Bayes-style confidence bounds result in bandit algorithms with improved cumulative regret. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-247787 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Intelligent Autonomous Systems | ||||
Date Deposited: | 20 Nov 2023 14:40 | ||||
Last Modified: | 22 Nov 2023 06:46 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/24778 | ||||
PPN: | 513359427 | ||||
Export: |
View Item |