Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

Aristide Tossou; Debabrota Basu; Dimitrakakis, Christos

doi:10.48550/arXiv.1905.12425

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

Auteur(s)

Aristide Tossou

Debabrota Basu

Dimitrakakis, Christos

Institut d'informatique

Date de parution

2019

In

Computing Research Repository (CoRR)

Vol.

1905.12425

Mots-clés

Résumé

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret O~(DSAT−−−−−−√) up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.

Identifiants

https://libra.unine.ch/handle/123456789/30972

_

10.48550/arXiv.1905.12425

_

arXiv:1905.12425v2

Type de publication

journal article

Dossier(s) à télécharger

main article: 1905.12425.pdf (1.99 MB)

google-scholar

Options

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities