Current and Future Trends in Stochastic Thermodynamics
4-29 September 2017 Nordita, Stockholm

Infomax Strategies for an Optimal Balance Between Exploration and Exploitation
Proper balance between exploitation and exploration is what
makes good decisions that achieve high reward, like payoff
or evolutionary fitness. The Infomax principle postulates
that maximization of information directs the function of
diverse systems, from living systems to artificial neural
networks. While specific applications turn out to be
successful, the validity of information as a proxy for
reward remains unclear. Here, we consider the multi-armed
bandit decision problem, which features arms (slot-machines)
of unknown probabilities of success and a player trying to
maximize cumulative payoff by choosing the sequence of arms
to play. We show that an Infomax strategy which optimally
gathers information on the highest probability of success
among the arms, saturates known optimal bounds and compares
favorably to existing policies. Conversely, gathering
information on the identity of the best arm in the bandit
leads to a strategy that is vastly suboptimal in terms of
payoff. The nature of the quantity selected for Infomax
acquisition is then crucial for effective tradeoffs between
exploration and exploitation.
Id: 264
Place: Nordita, Stockholm
Room: 122:026
Starting date:
18-Sep-2017   14:00
Duration: 01h00'
Presenters: CELANI, Antonio

