Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ssamot/ergo_mdp

Repository files navigation

ergo_mdp

Ergodic economics simulations using MDP formalisms

This needs a bit of work, it's still broken

"...Apparently so, but suppose you throw a coin enough times... suppose one day, it lands on its edge."

Legacy of Kain: Soul Reaver II

Episodic MDPs, unlike their non-episodic counterparts, have proven ergodic properties

Bojun, Huang. "Steady State Analysis of Episodic Reinforcement Learning." Advances in Neural Information Processing Systems 33 (2020).

Peters, Ole. "The ergodicity problem in economics." Nature Physics 15.12 (2019): 1216-1221.

Moldovan, Teodor Mihai, and Pieter Abbeel. "Safe exploration in Markov decision processes." Proceedings of the 29th International Coference on International Conference on Machine Learning. 2012.

\lim_{T \to \inf} \frac{1}{T}\mathop{\mathbb{E}}\sum_{t = 1}^TR(s_t,a_t) = V_\pi(s_0)

x' = \left{ {\begin{array}{*{20}{c}} {x + 0.5x,\quad p = \frac{1}{2}} \ {x- 0.4x,\quad p = \frac{1}{2}} \end{array}} \right}

\lim_{T \to\inf}\frac{1}{T}\mathop{\mathbb{E}}\left[\sum_{t = 1}^TR(s_t,a_t) \right] = V_\pi(s_)

Taleb's take on this

\\R\left((x,win),null\right) = 0.5x \R\left((x,lose),null\right) = -0.4x  \R\left((x,choose),stop\right) = 0  \\P((x,win)|(x,choose),play) = 0.5\P((x,lose)|(x,choose),play) = 0.5\P((x,stopped)|(x,choose),stop) = 1\P((x+0.5x,choose)|(x,win),null) = 1\P((x-04x,choose)|(x,lose),null) = 1\\

I would argue that most MDPs of interest are clearly non-ergodic. An MDP combined with a stochastic policy \pi is ergodic if all deterministic policies result in Markov Reward Processes that are ergodic. Almost all RL algorithms assume ergodicity. Value Iteration, the prime one, will just pass back rewards

Equivalently, we can say that an agent with a stochastic policy should be able to visit all states. The problem of ergodicity is that it makes agents overoptimistic, as it kinds of assumes that kinds of possible errors and bad luck are eventually recoverable. If I train as if ergodicity true, a 99% chance of losing everything vs an 1% chance of winning big times will average out, and an agent might actually go for high payout.

To work around the luck of ergodicity we make absorbing states extremely unrewarding. If you break your little toy helicopter you will, you get a massive negative reward. The reward has to be big enough, so that between the choice of getting further away on average and breaking down every so often, breaking down every so often to be considered unacceptable.

Generally, until now, tinkering with the reward function is considered enough. The agent learns to avoid those absorbing states, so that, eventually, the ergodic property is reclaimed.

The problem with this approach is that it's not trivial to model this arbitrary reward functions.

Percentages of winners and losers

Wealth of winners and losers

Percentages of winners and losers

Wealth of winners and losers

Tree

Well, the model is bonkers. The vast population becomes broke; the probability of being extremely wealthy is less and less (but more wealthy as things move forward). At the very end, because you cannot subdivide an indivudal to fewer than one points and let them have infinite wealth, the whole wealth model collapses.

So what is the

About

Ergodic economics simulations using MDP formalisms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published