Voici les éléments 1 - 4 sur 4
  • Publication
    Accès libre
    Isoperimetry is All We Need: Langevin Posterior Sampling for RL
    (2024)
    Emilio Jorge
    ;
    ;
    Debabrota Basu
    In Reinforcement Learning theory, we often assume restrictive assumptions, like linearity and RKHS structure on the model, or Gaussianity and log-concavity of the posteriors over models, to design an algorithm with provably sublinear regret. In this paper, we study whether we can design efficient low-regret RL algorithms for any isoperimetric distribution, which includes and extends the standard setups in the literature. Specifically, we show that the well-known PSRL (Posterior Sampling-based RL) algorithm yields sublinear regret if the posterior distributions satisfy the Log-Sobolev Inequality (LSI), which is a form of isoperimetry. Further, for the cases where we cannot compute or sample from an exact posterior, we propose a Langevin sampling-based algorithm design scheme, namely LaPSRL. We show that LaPSRL also achieves sublinear regret if the posteriors only satisfy LSI. Finally, we deploy a version of LaPSRL with a Langevin sampling algorithms, SARAH-LD. We numerically demonstrate their performances in different bandit and MDP environments. Experimental results validate the generality of LaPSRL across environments and its competetive performance with respect to the baselines.
  • Publication
    Accès libre
    Minimax-Bayes Reinforcement Learning
    (2023)
    Thomas Kleine Buening
    ;
    ;
    Hannes Eriksson
    ;
    Divya Grover
    ;
    Emilio Jorge
    While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.
  • Publication
    Accès libre
    Minimax-Bayes Reinforcement Learning
    (PMLR, 2023)
    Thomas Kleine Buening
    ;
    ;
    Hannes Eriksson
    ;
    Divya Grover
    ;
    Emilio Jorge
    While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.
  • Publication
    Accès libre
    Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning
    (2020-02-08T06:19:15Z)
    Hannes Eriksson
    ;
    Emilio Jorge
    ;
    ;
    Debabrota Basu
    ;
    Divya Grover
    Bayesian reinforcement learning (BRL) offers a decision-theoretic solution for reinforcement learning. While "model-based" BRL algorithms have focused either on maintaining a posterior distribution on models or value functions and combining this with approximate dynamic programming or tree search, previous Bayesian "model-free" value function distribution approaches implicitly make strong assumptions or approximations. We describe a novel Bayesian framework, Inferential Induction, for correctly inferring value function distributions from data, which leads to the development of a new class of BRL algorithms. We design an algorithm, Bayesian Backwards Induction, with this framework. We experimentally demonstrate that the proposed algorithm is competitive with respect to the state of the art.